concourse / pool-resource

atomically manages the state of the world (e.g. external environments)
Apache License 2.0
54 stars 36 forks source link

pool-resource needs an atomic move operation #30

Open chendrix opened 7 years ago

chendrix commented 7 years ago

Moved from concourse/concourse#196

cc @xtreme-gavin-enns


Currently in order to move an environment between pools (to model state changes) we need to perform separate add and remove operations. This has two downfalls:

1) one operation may fail leaving us with duplicated or deleted environments 2) one operation may take a long time, leaving us temporarily in a state similar to the above

This could be resolved by implementing a move operation that performs both parts of the move in a single commit/push.

sc68cal commented 4 years ago

I have a ConcourseCI pipeline that tries to implement this as much as possible without the ability to have atomic/guaranteed operations.

---
resources:
  - name: every-24h
    type: time
    source: {interval: 24h}
    check_every: 12h

  - name: dirty-hardware
    type: pool

  - name: clean-hardware
    type: pool

jobs:

  - name: Pick up a dirty cluster and clean it up
    plan:
      - get: every-24h
        trigger: true

      - put: dirty-hardware
        params: {acquire: true}

      - task: Do work
          # Work here, but deleted for brevity

        on_failure:
          # Work attempt failed, release the lock and try again next time
          put: dirty-hardware
          params: {release: dirty-hardware}

      # Work succeeded, change the hardware state
      - put: clean-hardware
        params: {add: dirty-hardware}

      # Delete the old lock 
      - put: dirty-hardware
        params: {remove: dirty-hardware}

I have had instances where ConcourseCI itself, under load, has had steps fail due to resource exhaustion - but they've been very very rare.

I know it's not perfect, but this is what I ran into and how I attempted to solve it.