concourse / pool-resource

atomically manages the state of the world (e.g. external environments)
Apache License 2.0
56 stars 36 forks source link

Claiming too quickly after an unclaim causes 'error: lock instance is no longer acquired' #2

Closed topherbullock closed 9 years ago

topherbullock commented 9 years ago

We have two jobs which try to acquire a lock on the same pool ( with only one lock to acquire ), the first one to acquire the lock failed to release the lock because the second claimed the env immediately after the unclaim commit. Probably related to this check

Job 1

:arrow_up: Env

  acquiring lock on: aws-bosh-release

:arrow_down: Env

  Cloning into '/tmp/build/get'...
  9f3afaf claiming: mars

bunch of jobs and stuff

:arrow_up: Env

releasing lock: mars on pool: aws-bosh-release 

:arrow_down: Env

  Cloning into '/tmp/build/get'...
  f1169b2 unclaiming: mars
  error: lock instance is no longer acquired

/

Job 2 :arrow_up: Env

  acquiring lock on: aws-bosh-release
  .........................................................

:arrow_down: Env

  Cloning into '/tmp/build/get'...
  70c75d1 claiming: mars

Commits in Git logs show that Job #2's claim happened immediately after Job #1 's unclaim.

concourse-bot commented 9 years ago

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

vito commented 9 years ago

sweet emoji art

jtarchie commented 9 years ago

Reproducible steps:

Setup

mkdir -p repo/pool/unclaimed
mkdir -p repo/pool/claimed
cd repo
git init
git add -A
git ci -m 'init pool of resources'

Job A performing the put on a concourse pipeline

git mv pool/unclaimed/RESOURCE pool/claimed/RESOURCE
git ci -m 'claiming resource from Job A'

git mv pool/claimed/RESOURCE pool/unclaimed/RESOURCE
git ci -m 'unclaiming resource from Job A'
ref=$(git log HEAD --format="%H" | head -1)

Job B claiming the resource

git mv pool/unclaimed/RESOURCE pool/claimed/RESOURCE
git ci -m 'claiming resource from Job B'

Job A then running the /opt/resource/in script on it's ref

$ git log --oneline $ref..HEAD -- pool/
869df45 claiming resource from Job B

The following is showing the error message from here.

It looks like a possible solution may be just filtering out subsequent claimed messages.

For example.

$ git log --grep '^claiming' --invert-grep --oneline $ref..HEAD -- pool/

There is no output from this command.

jtarchie commented 9 years ago

We've tested the above fix on our concourse deployment and we no longer experience the error.

concourse-bot commented 9 years ago

Hello again!

All stories related to this issue have been accepted, so I'm going to automatically close this issue.

At the time of writing, the following stories have been accepted:

If you feel there is still more to be done or have any questions, feel free to reopen!