buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

Prevent agents from returning to the queue during the 'wait' step #281

Closed dtruong0 closed 6 years ago

dtruong0 commented 6 years ago

Prevent agents returning to the queue during the wait step for builds that run steps in parallel.

I have a pipeline that looks like this

  - name: 'Test 1’
    command: scripts/buildkite/tests.sh
    env:
      PARALLEL_JOB: 0
    agents:
      queue: QUEUE
      aws:instance-id: __INSTANCE_ID1__
  - name: 'Test 2’
    command: scripts/buildkite/tests.sh
    env:
      PARALLEL_JOB: 1
    agents:
      queue: QUEUE
      aws:instance-id: __INSTANCE_ID2__
  - wait
  - name: 'Make Version Report'
    command: scripts/buildkite/buildkite-version-report.sh
    env:
      PARALLEL_JOB: 0
    agents:
      queue: QUEUE
      aws:instance-id: __INSTANCE_ID1__

QUEUE and__INSTANCE_ID1__ are substituted during the pipeline upload step

Say Tests 1 completes in 5 minutes and Test 2 completes in 20 During the 15 minute wait Test 1’s agent will return to the queue of available agents. As a result the build can be left hanging because Test 1’s build agent was stolen by another build.

toolmantim commented 6 years ago

Hey David! Sorry you're having troubles with this. I wonder if there's a way to avoiding targeting specific instance ids in your build pipeline? Then you could use any available agent, rather than relying on the dependence of a specific agent. This is usually how people accomplish this, and if there's some shared state either push it to S3 or use a build artifact.

dtruong0 commented 6 years ago

Yeah this is a shared state issue because the testing environment is setup in the previous steps. During the pipeline upload step the second agent is chosen using the api to find the next available.

Is it possible to set the queue meta-data or change the agent prioritisation during a build?

toolmantim commented 6 years ago

Thanks for the details. Hmm it's a tricky one. And I'm trying to figure it out, but I don't fully understand what the limitations are.

You can't set agent meta-data or agent prioritisation during a build I'm afraid. You can stop agents and start them again?

I'm afraid what you're trying to do doesn't really gel with the primitives Buildkite supplies, at least in the way you're trying to set this up. But there has to be a way to accomplish what you need. Is there no way to store the state somewhere?

Or could you just make each of the initial steps do the version report as well?

steps:
  - name: 'Test 1'
    commands:
      - scripts/buildkite/tests.sh
      - scripts/buildkite/buildkite-version-report.sh
    env:
      PARALLEL_JOB: 0
  - name: 'Test 2'
    commands:
      - scripts/buildkite/tests.sh
      - scripts/buildkite/buildkite-version-report.sh
    env:
      PARALLEL_JOB: 1
dtruong0 commented 6 years ago

if i think about it the core problem that i'm trying to work around is that builds don't fail on failed steps, unless a wait is used.

If theres no other way to do it? Then storing state would be the best option Thanks

toolmantim commented 6 years ago

Yeah, builds don't immediately fail when a step fails, they wait for each step to finish. But then the whole build should be marked as failed if any step fails, once they're all complete? You might have to store the state somewheres sorry!

I'm going to close this for the moment, because I can't see any changes we could make at a platform level to help. But feel free to email support if you need help getting it all working. We're here to help!