actions / upload-artifact

MIT License
3.26k stars 739 forks source link

[bug] v4: overwrite: true fails with parallel jobs writing to the artifact #506

Open yury-s opened 10 months ago

yury-s commented 10 months ago

What happened?

When setting overwrite: true, upload sometimes fails with the following output:

Run actions/upload-artifact@v4
  with:
    name: pull-request-number
    path: pull_request_number.txt
    overwrite: true
    if-no-files-found: warn
    compression-level: 6
  env:
    FORCE_COLOR: 1
    FLAKINESS_CONNECTION_STRING: 
    ELECTRON_SKIP_BINARY_DOWNLOAD: 1
    PWTEST_BOT_NAME: ubuntu-latest-node18-1
With the provided path, there will be 1 file uploaded
Artifact name is valid!
Root directory input is valid!
Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

What did you expect to happen?

One of the job succeeded and other silently continue doing nothing as documentation of the option suggests.

How can we reproduce it?

Configure several parallel jobs within a workflow and matrixes to write into the same artifact with overwrite: true, something like this:

    - name: Upload artifact with the pull request number
      if: always() && github.event_name == 'pull_request'
      uses: actions/upload-artifact@v4
      with:
        name: pull-request-number
        path: pull_request_number.txt
        overwrite: true

The upload action will fail sometimes, e.g. see this run.

Anything else we need to know?

No response

What version of the action are you using?

v4.3.0

What are your runner environments?

linux, window, macos

Are you on GitHub Enterprise Server? If so, what version?

No response

melloware commented 10 months ago

I get this error as well https://github.com/open-sce/fluent-cli/actions/runs/7254116343/job/19762158254

Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

darthcloud commented 10 months ago

Updated to v4 as well due to Node.js warning, added overwrite: true and still getting the error: https://github.com/darthcloud/BlueRetro/actions/runs/7768402627/workflow

melloware commented 10 months ago

I even tried v4.3.0 just to make sure it was using the latest version and the same error happens.

melloware commented 10 months ago

I was able to fix my problem by following the migration guide:https://github.com/actions/upload-artifact/blob/main/docs/MIGRATION.md

See my commit: https://github.com/open-sce/fluent-cli/commit/900fed54e6680276ffbc62365843f005ceb7e990

moos3 commented 8 months ago

I'm running into the this issue Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run only on some builds, not always. I'm doing this as the upload

- name: Upload meta bake definition
        uses: actions/upload-artifact@v4
        with:
          name: ${{ matrix.version }}-onbuild-poetry-bake-meta
          path: /tmp/bake-meta.json
          if-no-files-found: error
          overwrite: true

Isn't the whole point of overwrite, to overwrite it and not care if that artifact is already there?

JonathanAtCenterEdge commented 8 months ago

Getting this exact same issue, randomly with overrwrite: true my workflows are failing with a 409 conflict

darthcloud commented 7 months ago

I don't understand why you are deprecating v3: https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/

When v4 is still broken on parallel job setup.

robherley commented 7 months ago

👋 I want to clarify that the overwrite operation is not atomic. It simply is a helper to delete the artifact before creating a new one.

The intended purpose of this overwrite feature was not for parallel jobs. It was meant for serial overwriting (like uploading a binary, then downloading it, signing it, then reuploading it).

If you are trying to upload to the same artifact name across parallel jobs, you will hit race conditions. This does not merge artifacts across jobs. If you have jobs A, B and C and they all try to upload my-artifact with overwrite: true, only the contents from one of the job's artifacts will be contained, the last one that wrote. The uploads from the other jobs would be just wasted time. You would be better off skipping the other jobs with a conditional if you do not care about the artifact contents to save runtime costs.

However, if you do care about the artifact contents from multiple concurrent jobs, you simply need to give the artifacts different names (like variables of your parallel matrix) and call actions/upload-artifact/merge which is outlined in the migration guide: https://github.com/actions/upload-artifact/blob/main/docs/MIGRATION.md#merging-multiple-artifacts

This is exactly what @melloware stated above and implemented in their workflow, and is the correct solution for v4 due to the key differences of how this new major version works.

Hope this helps!

Vampire commented 3 months ago

I've also hit the same error. My use-case is, that I used the action as a helper to verify the build is producing the intended output. So I run the build on windows, linux, and macOS through matrix. Each of the builds should produce the same artifact. By configuring the artifact with overwrite true, the run would fail if one of the builds does not produce the artifact. If all builds are producing the artifact I just need one of them, doesn't matter which as all should be equal. Now I also hit the problem that two of the parallel running jobs checked whether to delete the file and then both uploaded the file, making the second fail.

Of course I can change the workflow to otherwise just test whether the artifact was produced and only upload from one of the workflows.

But it would be nice if this just worked. So if doing the actual upload failed due to already existing with overwrite true, it would be nice if then in an endless loop, the delete would be done again and then the upload again.

The advantage of doing it like I do it - if it would work reliably - is, that if any of the OS builds succeeds, the artifact is there in the end. If I just do a manual verification whether the artifact was built and only upload from one of the jobs, then the artifacts are only present if exactly that job was actually successful.

Vampire commented 3 months ago

Actually I rewrote my logic now, so that the individual jobs verify the file was built or fail, and then just save to GHA cache with the run id and then there is one additional job that restores one of the caches and uploads it as artifact. :-) Works and does not upload multiple times, but is a bit more complex to setup. Thank god I'm using typesafegithub/github-workflows-kt to write my workflows. :-)

liyishuai commented 2 months ago

However, if you do care about the artifact contents from multiple concurrent jobs, you simply need to give the artifacts different names

Naming different jobs is not always simple, especially when the jobs are only different in docker tags. You cannot name the artifacts as artifact-user1/repo1:v1.txt and artifact-user2/repo2:v2.txt, which are invalid filenames.

My solution is to ignore 409 conflicts using continue-on-error.