concourse / pool-resource

atomically manages the state of the world (e.g. external environments)
Apache License 2.0
56 stars 36 forks source link

Do locks leak? #10

Closed tomwhoiscontrary closed 8 years ago

tomwhoiscontrary commented 8 years ago

In your example:

jobs:
- name: deploy-aws
  plan:
    - put: aws-environments
      params: {acquire: true}
    - task: deploy-aws
      file: my-scripts/deploy-aws.yml

- name: test-aws
  plan:
    - get: aws-environments
      passed: [deploy-aws]
    - task: test-aws
      file: my-scripts/test-aws.yml
    - put: aws-environments
      params: {release: aws-environments}

If deploy-aws fails, and does not trigger test-aws, does it release the lock? Asking for a friend.

concourse-bot commented 8 years ago

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

vito commented 8 years ago

In that example it would stay locked, yes. Some teams use this to make sure someone takes a look at what went wrong, and then they release the lock once things are in order again (usually with a separate job that just does that one thing).

Alternatively you could use ensure: to make sure the lock gets released regardless.

tomwhoiscontrary commented 8 years ago

Some teams have to keep manually intervening to deal with the consequences of flaky jobs.

If you put an ensure on the first job, you can't pass the lock to the second job. If you put it on the second job, it won't come into play if the first job fails. Is there any way to use ensure to reliably clean up a lock in a two-job scenario like your example?

danger-ranger commented 8 years ago

We dealt with this by using a series of separate pools to indicate the state of a claimed resource, then in the ensure we check to make sure that the resource got into the expected state, otherwise we "clean it up".

This allows us to have a pipeline that automatically re-deploys shared environments after they have been used for testing without having to babysit a flaky pipeline. The downside is that sometimes things go wrong and the pipeline iterates forever in a tight deploy-fail-destroy-retry loop. We would like to add some handling for tracking how many times a deploy has failed in a row on a given environment, but haven't done so yet.

vito commented 8 years ago

@tomwhoiscontrary In that scenario I would use on_failure: on the first job and an ensure: on the second job.

concourse-bot commented 8 years ago

Hello again!

All stories related to this issue have been accepted, so I'm going to automatically close this issue.

At the time of writing, the following stories have been accepted:

If you feel there is still more to be done, or if you have any questions, leave a comment and we'll reopen if necessary!