dotnet / docker-tools

This is a repo to house some common tools for our various docker repos.
MIT License
124 stars 46 forks source link

Retry of Publish stage fails due to duplicate source-build-id artifact #1323

Closed mthalman closed 4 months ago

mthalman commented 5 months ago

A build had fail in the Wait for Image Ingestion step of the Publish stage. I attempted to retry the build and it failed in an earlier step: Publish Source Build ID Artifact with the following error:

Artifact source-build-id already exists for build 2466755.

Example build (internal link)

dotnet-issue-labeler[bot] commented 5 months ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

dotnet-issue-labeler[bot] commented 5 months ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

lbussell commented 5 months ago

[Triage] This is only consumed in the update-image-builder pipeline:

https://github.com/dotnet/docker-tools/blob/93ef66423ed62d4dd1fd5419590cfa765f322d30/eng/pipelines/update-image-builder-tag.yml#L24

For retries of the same job, the sourceBuildId will be the same. So, we could solve this in one of three ways:

  1. Force overwrite the published artifact
  2. Don't publish a new artifact when one already exists
  3. Move the sourcebuildid artifact publishing to the very end of the pipeline
ellahathaway commented 4 months ago

For option 1, since Azdo pipelines don't support force overwriting of artifacts, we'd have to do a workaround for uploading a 2nd (or 3rd, etc), version of the artifact. This means we'd have to download the artifact to check its existence and version before uploading another version of the artifact. I don't love this approach as it adds overhead and creates copies.

For option 2, we'd have to download the published artifact in order to check its existence. This also adds overhead.

Therefore, I think that option 3 is the best. We would just have to condition the step to only run if previous steps were successful.

mthalman commented 4 months ago

There's another option not listed:

Set the artifact name to include the job attempt number. This is done today for another part of the infra: https://github.com/dotnet/docker-tools/blob/570b206e8a83d0517d2732766818abc444737dae/eng/common/templates/jobs/build-images.yml#L140

But the variable nature of that name makes it annoying to have to compensate for when consuming the artifact.

Therefore, I do agree that moving the artifact publishing to the end is the best option.