concourse / pool-resource

atomically manages the state of the world (e.g. external environments)
Apache License 2.0
56 stars 36 forks source link

Having trouble stabilizing a demonstration pipeline #21

Closed hfinucane closed 7 years ago

hfinucane commented 7 years ago

I've been doing some testing to make sure that I could use this, but my demonstration tends to grind to a halt and stop working when I leave it alone for too long.

resources:
- name: demolock # Single pool, single file prepopulated in demo/unclaimed/
  type: pool
  source:
    uri: ssh://git@git/concourse-locks.git
    branch: master
    pool: demo
    private_key: {{SSH_PRIVATE_KEY}}
- name: tick
  type: time
  source: {interval: 5s} # I know, it doesn't actually go off this often

jobs:
- name: Step 1
  plan:
    - get: tick
      trigger: true
    - put: demolock
      params: {acquire: true}
    - task: hello-world
      config:
        platform: linux
        image_resource:
          type: docker-image
          source: {repository: busybox}
        run:
          path: echo
          args:
          - hello world

- name: Step 2
  plan:
    - get: tick
      passed: ["Step 1"]
      trigger: true
    - get: demolock
      passed: ["Step 1"]
    - task: sleep
      config:
        platform: linux
        image_resource:
          type: docker-image
          source: {repository: busybox}
        run:
          path: sleep
          args:
          - 1

- name: Step 3
  plan:
    - get: tick
      passed: ["Step 1"]
      trigger: true
    - get: demolock
      passed: ["Step 1"]
    - task: sleep
      config:
        platform: linux
        image_resource:
          type: docker-image
          source: {repository: busybox}
        run:
          path: sleep
          args:
          - 1

- name: "Step 4"
  plan:
    - get: tick
      passed: ["Step 2", "Step 3"]
      trigger: true
    - get: demolock
      passed: ["Step 2", "Step 3"]
      trigger: true
    - put: demolock
      params: {release: "demolock"}

I think that the best way to reproduce it is to prod the "tick" resource so that the builds come down in the wrong order. I've seen "Step 4" remain untriggered, but usually it bails out with error: lock instance is no longer acquired while the lock file is still in demo/claimed/. That said, generally starting it up and going for a coffee is good enough. If I drop the get: tick step in the "Step 4" plan, it never bails out with an error.

concourse-bot commented 7 years ago

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

hfinucane commented 7 years ago

I'm running Concourse 2.2.1, in case this has anything to do with the Concourse scheduling side.

vito commented 7 years ago

You may want to put serial: true on the first job, otherwise a few may queue up and you may have a later job actually acquire the lock. This'll then be skipped by the subsequent jobs and things could get weird.

You could also try setting version: every on the get: demolock steps.

hfinucane commented 7 years ago

serial: true on the first job seems like it has stabilized it. Should this go into the README as "something you should always set"?

hfinucane commented 7 years ago

Although from my point of view it seems like every would maybe be a better solution? Am I just getting accidental every semantics with serial because lock acquisitions aren't getting executed out of order?

vito commented 7 years ago

You definitely don't always need serial: true as you may have multiple locks and may want to be able to run tests against multiple environments concurrently for example.

version: every may be the better solution, yeah. But there's a trick to that too, unfortunately: https://github.com/concourse/concourse/issues/736

serial: true is currently the easiest technique.

chendrix commented 7 years ago

Hi there, I'm going to close this as it looks like your question was resolved. Feel free to reopen or to ask on slack or on stackoverflow