actions / upload-artifact

MIT License
3.13k stars 707 forks source link

Provide guidance for workflows broken by @4? #472

Open nedbat opened 9 months ago

nedbat commented 9 months ago

What files would you like to change?

I can see that @4 brought improvements in speed and reliability, that's great. But it also broke a number of common patterns in workflows. Can you provide some guidance about ways to achieve the same outcomes with the new version? This might involved new features in actions/download-artifact.

For example:

What are your suggested changes?

I'm looking for help, I don't have solutions yet myself.

henryiii commented 9 months ago

Personally, I think two changes would really help: If there was some way for download-artifact to extract multiple artifacts into a single directory, and if there was some way to use a glob to express the artifacts to download. They could even be together, such as specifying name: coverage-* as the name extracting all matching files into one directory, like setting one name does. Or it could be a toggle, like merge: true/false/auto.

FYI, this affects basically every cibuildwheel user (there are over a thousand), as it was the way you'd merge binaries on different platforms before uploading to PyPI. It also affects the recommendations from the PyPI and Scientific-Python Development Guidelines, etc. It also affects publishing websites that are built in multiple steps (like building WebAssembly components).

FYI, people are merging the v4 updates without realizing the breakage, as Dependabot is making them separately[^1], so if they appear broken, they still go with it assuming it's the v3/v4 incompatibility, and if it's only used for the release step (common with cibuildwheel workflows), then they won't even know it's broken until they try to release. I'm currently having to rush a release of cmake for Python to fix a segfault and someone already merged v4 updates, so that pipeline is broken. It would be great if this breaking change could be listed in the dependabot update directly, rather than behind a link, but I assume that's too late.

Happy to update as much as I can to whatever the recommendation is. My current plan (in https://github.com/pypa/cibuildwheel/issues/1699 & https://github.com/scientific-python/cookie/pull/335) is:

    - uses: actions/upload-artifact@v4
      with:
        name: Wheel-${{ matrix.<whatever> }}
...

    - uses: actions/download-artifact@v4
      with:
        path: all

    - name: Merge files
      run: |
        mkdir dist
        mv all/Wheel-*/* .

But besides the extra shell step, will also download everything, even if there's stuff like coverage files that are not needed for that job.

[^1]: I'm going to start recommending the grouped dependabot updates feature that was recently released soon! That would have helped here.

webknjaz commented 9 months ago

Even in the context wider than just uploading to PyPI, when I build a bunch of wheels on different platforms, there should be a way to collect them all for testing (examples: https://github.com/aio-libs/yarl/blob/88900a1/.github/workflows/ci-cd.yml#L208-L265 / https://github.com/aio-libs/frozenlist/blob/b5ffcd7/.github/workflows/ci-cd.yml#L207-L264)

webknjaz commented 9 months ago

This also breaks the use of composite actions that themselves make use of either upload-artifact or download-artifact:

I'm curious if the change is an attempt to reduce occurrences of the race condition I reported a while back: https://github.com/actions/toolkit/issues/1235 / https://github.com/actions/download-artifact/issues/140#issuecomment-1314062872.

woodruffw commented 9 months ago

As an additional datapoint here: GitHub's own docs for publishing to PyPI via OIDC encourage the "upload-then-download" pattern, which (I believe) will be broken by these action changes when used with multiple uploaders (which is common in Python, for builds on different hosts or platforms).

(This has the unfortunate potential to break a large percentage of Python publishing workflows.)

Ref: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-pypi

per1234 commented 9 months ago

If there was some way for download-artifact to extract multiple artifacts into a single directory

From my experiments while trying to find a way to adapt the hundreds of workflows I maintain that rely on the ability to upload multiple times from matrix jobs to the same artifact for transfer to a dependent job that consumes the content of the artifact, I find the actions/download-artifact@v4 action already has that capability.

For example, a workflow job with these steps will succeed and show that the files from the foo and bar artifacts are present under the baz folder as expected.

- uses: actions/download-artifact@v4
  with:
    name: foo
    path: baz

- uses: actions/download-artifact@v4
  with:
    name: bar
    path: baz

- run: ls baz

If you found some conditions where the action is not able to do this, please do submit a report to the issue tracker for the actions/download-artifact as I agree that it will be a common use pattern for those of us who are migrating from using a single artifact to multiple artifacts for transferring files from one job to another.

if there was some way to use a glob to express the artifacts to download

I agree. This request is already tracked at https://github.com/actions/download-artifact/issues/6

henryiii commented 9 months ago

The problem with repeating the action over and over is that for a normal matrix for coverage or wheel builds, there might now be dozens of artifacts (one per Python version × OS × arch × PyPy/CPython, for example). Also, if one or more jobs fail, you still might want to upload coverage, which means you'd also need to do something like always(). Last I checked I think cibuildwheel could produce about 70+ different wheels if you targeted everything it can handle[^1]. You can also dynamically generate the matrix, see https://iscinumpy.dev/post/cibuildwheel-2-10-0/#only, which is how MyPyC does it. A normal coverage set is about 20 or so (5 Python versions + 3 PyPy versions × 3 OSs).

[^1]: I'm absolutely describing "worst case" - often (and for the default cibuildwheel examples), it's just 3 OSs, with all the wheels being built in one job per OS. Plus an SDist job. So 4 artifacts, which isn't terrible listed out.

webknjaz commented 9 months ago

(my attempt to summarize the important pain point contexts)

@per1234 your example features a linear workflow with a known static artifact names, but this issue refers to a different practice of CI configuration which involves matrixes.

Note all other people who commented here, combined maintain hundreds of workflows too. What unites us is that we all come from the Python ecosystem and rely on the ability to upload a number of artifacts from jobs in matrixes of varying sizes and different factor naming strategies. We essentially need a way to combine many files coming from different jobs (not necessarily one file per job, though — but usually, all uploaded in one go) in a "synchronization point" — a job that waits for that matrix and does something with those files.

Two major contexts in which this is happening are:

  1. Publishing a bunch of Python distribution packages to PyPI — this has to happen together for all files, once we know all the previous jobs are complete.
  2. Combining coverage data produced in separate jobs into one, for determining if the overall metrics meet the configured threshold.

With different artifact names, we'd need to loop over the matrix artifacts somehow to get all those files together. And GitHub Actions workflows don't have syntax for such loops. Moreover, this means every project reinventing a wheel of naming the workflows differently and figuring out how to extract only matching ones into that sync job.

ofek commented 9 months ago

I don't have time to experiment with upgrading this action this weekend but I plan to manually download like this https://github.com/WordPress/openverse/blob/00a26a65e93fe8ea2785d4a26c3ebce2efc935db/.github/actions/load-img/action.yml#L26-L35

However, rather than an enumeration I will perform pattern matching after I figure out the APIs.

webknjaz commented 9 months ago

I don't have time to experiment with upgrading this action this weekend but I plan to manually download like this https://github.com/WordPress/openverse/blob/00a26a65e93fe8ea2785d4a26c3ebce2efc935db/.github/actions/load-img/action.yml#L26-L35

However, rather than an enumeration I will perform pattern matching after I figure out the APIs.

I also have some hacking ideas with random names, pattern matching and CLI. Working on a composite action based experiment...

TWiStErRob commented 9 months ago

There are mentions of separate upgrades of v3/v4 here breaking and I've had problems in many of my repositories. If anyone is using Renovate and has incompatibility problems, this rule can fix the problem easily (I added it to my user level Renovate config to affect all repos at once.):

"packageRules": [
    {
        "groupName": "actions/upload-artifact and actions/download-artifact",
        "description": "Artifact management is coupled in GitHub Actions. https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/",
        "matchManagers": ["github-actions"],
        "matchPackageNames": [
            "actions/upload-artifact",
            "actions/download-artifact"
        ],
        "matchUpdateTypes": ["major"]
    }
]

Note: this above is really explicit, it could work without matching updateTypes and managers too, if you want to generalize.

per1234 commented 9 months ago

your example features a linear workflow with a known static artifact names

@webknjaz as indicated by the quote in that comment, my example is specifically in response to @henryiii's comment implying that the actions/download-artifact is unable to download multiple artifacts to the same path:

If there was some way for download-artifact to extract multiple artifacts into a single directory

The sole purpose of my example was to supplement my comment that I have found that to be untrue. The actions/download-artifact is able to download multiple artifacts to the same path.

You seem to somehow be interpreting my comment as being intended to dispute the impact of the change of only allowing one upload per artifact. It was not.

Note all other people who commented here, combined maintain hundreds of workflows too. What unites us is that we all come from the Python ecosystem

That is false. I also maintain hundreds of workflows that were broken by this change and these workflows have nothing to do with Python. It is counterproductive to imply that this impact is specific to Python. The impact is much more widespread than that.

webknjaz commented 9 months ago

@per1234 I apologize! I meant to augment the context, not dismiss your input. I did indeed misinterpret your comment, though, and felt like it turns the discussion away from the most problematic part of the breaking change. I didn't mean to imply that this problem is only present in our ecosystem, just wanted to give the context of who all those other people besides you.

webknjaz commented 9 months ago

@ofek @henryiii @hynek @woodruffw so this is the UX I'm shooting for: https://github.com/webknjaz/re-actors--download-artifact--test/actions/runs/7242013168/workflow#L38-L42. Feel free to lurk and experiment, but don't rely on the Git identifiers — I'll make my usual versioning setup with immutable tags and rolling branches and will probably force-push the current thing a couple of times later today, before declaring it ready. It gets artifacts matching the wildcard into a temporary directory and then moves the underlying dirs into the user-requested path.

I'll also rename the action repo — originally, I thought of it as a wrapper for the actions/download-artifact action but I ended up not calling it and calling GH CLI directly instead. Keeping it as is would be misleading. Still, it'll keep using the original action's inputs list (with the same names and defaults). With that in mind, I was thinking of something like "fuse" / "meld" / "merge" / "combine" / "squash" and/or maybe "group" in the name. E.g. re-actors/meld-artifact-group. Accepting the naming ideas until I'm finished with the polishing. Also, need a name for the marketplace thing that may be different (but not necessarily).

jnsgruk commented 9 months ago

Not sure how useful or directly applicable this is to OP's problem, but I worked around this in a project I help maintain using the gh cli tool and some reasonably simple bash logic here: https://github.com/snapcrafters/ci/pull/24.

Hopefully useful to some folks 😄

andrewakim commented 9 months ago

👋 Hey everyone!

We appreciate all the feedback you’ve shared so far about upload/download artifact@v4. We’re excited about this release as it includes many improvements, like considerably faster upload/download speeds and the immediate availability of uploaded artifacts. In order to achieve these performance gains, we decided to fundamentally rewrite Artifacts from the ground up, which we’ll share more about in a future blog post.

There was some concern expressed about the immutability of an artifact once uploaded. Previously, each file within an Artifact was individually uploaded to a proxy service before landing in blob storage. This approach allowed actions/upload-artifact to be called on the same object multiple times but it also adversely impacted performance and risked content corruption during concurrent uploads to the same artifact. Artifacts in v4 are individually sealed archives (zips) that are assembled on the runner and then uploaded directly to blob storage. We believed this change was worth the tradeoff for significant performance gains and to protect the integrity of uploaded artifacts.

We'll be making an update to actions/download-artifact by end of day which will include a solution similar to others proposed in this issue:

Thank you for your engagement with Artifacts and your valuable feedback.

webknjaz commented 9 months ago

Not sure how useful or directly applicable this is to OP's problem, but I worked around this in a project I help maintain using the gh cli tool and some reasonably simple bash logic here: snapcrafters/ci#24.

Hopefully useful to some folks 😄

That's basically what I've been attempting in my action experiment — GH CLI + passing name to --pattern + auto-merging. Been trying to keep the UX as close to the v3 as possible, because adding new arguments means more changes for the end-users, which is rather annoying on the scale.

schmidtw commented 9 months ago

@andrewakim Thank you for the update. Do you have any suggestions/guidance for workarounds for the 10 file limit per workflow? Or should we be using a different action for the following use cases:

TWiStErRob commented 9 months ago

@schmidtw the readme says:

Each job in a workflow run now has a limit of 10 artifacts.

Does a job with matrix: strategy count as one "job" for you? It works for me: https://github.com/TWiStErRob/net.twisterrob.gradle/actions/runs/7231579385

robherley commented 9 months ago

👋 The changes @andrewakim stated are now released on the v4 branch.

You can find the new inputs documented in the README:

As well as a new migration document outlining this use case:

Thanks again for your feedback!

nedbat commented 9 months ago

@robherley I haven't tried it yet, but this looks great. A link to the migration doc from the readme of this repo will help a lot. The first sign many people will see that indicated a needed migration will be an error from upload-artifact. The breaking change notice in the readme is helpful, and a link to the migration doc will be even better. Thanks!

TWiStErRob commented 9 months ago

Thank you for implementing and releasing this, @robherley; it's been a long-awaited change! It's amazing to see the activity on these actions 🤩.

I've tested the the PR branch before the reviews and it worked well. Please consider reviewing the issues linked in https://github.com/actions/download-artifact/pull/259#issuecomment-1861691243 and updating/closing those threads too if relevant.

webknjaz commented 9 months ago

@robherley the example in your PR demonstrates name: my-artifact-${{ matrix.runs-on }} as an example for naming different artifacts differently. In some cases, that's not enough to guarantee uniqueness. For example, with several matrixes adding to the same pattern or the use of extra include: entries. A solution could be to have that suffix generated randomly (with something like python3 -c 'import uuid; print(uuid.uuid4())' or python3 -c 'import secrets; print(secrets.token_hex(16))'). I think the upload action migration doc may need better examples with this, and maybe even some kind of a first-class support through a new input.

webknjaz commented 9 months ago

@robherley do you know if common directory layout subpaths are merged seamlessly? I don't see examples or tests for that in the PR.

bryanmacfarlane commented 9 months ago

I think we should really get rid of the 10 artifact limit. ASFAIK, there's no real reason to have it (even though most have less which isn't a reason).

schmidtw commented 9 months ago

@schmidtw the readme says:

Each job in a workflow run now has a limit of 10 artifacts.

Does a job with matrix: strategy count as one "job" for you? It works for me: https://github.com/TWiStErRob/net.twisterrob.gradle/actions/runs/7231579385

Thank you for pointing that out. I read it a couple of times, & for whatever reason my brain replaced "job" with "workflow". 10 artifacts per job is perfectly fine.

webknjaz commented 9 months ago

FTR here's the guidance I'm giving in one of the shared maintenance orgs I overlook: https://github.com/orgs/aio-libs/discussions/31#discussioncomment-7891886

ericLemanissier commented 9 months ago

I've just been bitten too. It's a bad experience for the user that the breaking changes are not mentioned at all in the release page https://github.com/actions/upload-artifact/releases. The corollary is that the breaking changes are not mentioned either in @dependabot's PRs.

Regarding the fix, using ${{ matrix.whatever }} in the artifact name is not a very general solution. Some workflows have matrices with several dimensions, and some matrix values are not adapted for an artifact name(they can contain forbidden characters). I think that a best solution is to use ${{ github.job }}-${{ strategy.job-index }}

webknjaz commented 9 months ago

Regarding the fix, using ${{ matrix.whatever }} in the artifact name is not a very general solution. Some workflows have matrices with several dimensions, and some matrix values are not adapted for an artifact name(they can contain forbidden characters). I think that a best solution is to use ${{ github.job }}-${{ strategy.job-index }}

Is the job id unique across re-runs? Should the attempt id be included?

nedbat commented 9 months ago

Wow, I didn't realize that the release page didn't mention the breaking change...

@andrewakim Thanks for helping get the updated download functionality in place, but as a bit of feedback: your comment starts with two long paragraphs about how great v4 is, waiting until the third paragraph to address our concerns. This underscores the general feeling of being marketed a product that doesn't do what we need, rather than having peer-to-peer engineering discussions about how to solve our problems.

ofek commented 9 months ago

That's not at all how I interpreted it as the first two paragraphs include important context about the implementation that explain why the change was made. Until then, I did not really know how it worked behind the scenes.

ekohl commented 9 months ago

I agree it's useful context that should have gone into the release notes when it went out, instead of everyone being notified by dependabot and seeing things were broken without a clear explanation.

webknjaz commented 9 months ago

I agree it's useful context that should have gone into the release notes when it went out, instead of everyone being notified by dependabot and seeing things were broken without a clear explanation.

There's a more dangerous side of this: some projects wouldn't even be notified in PRs when the download action isn't run unconditionally, but is gated by a release request. It'd be quite unpleasant to get the news months after that upgrade got merged, in the middle of a release process.

Knowing this, I searched through the PyPA repos the other day and found an actual thing that would break in this exact manner because the updated workflow that got the update is triggered manually: https://github.com/pypa/pipx/pull/1147#discussion_r1429431139. And I'm also tracking the recommendations within the @aio-libs org @ https://github.com/orgs/aio-libs/discussions/31 to provide guidance to the maintainers of different repos under the umbrella.

ofek commented 9 months ago

@nedbat Now that the work has been done, is it fine to close this issue? https://github.com/actions/upload-artifact/issues/472#issuecomment-1861820431

webknjaz commented 9 months ago

@ofek I think we still need to have a clear example/understanding of how to make unique artifact names with multiple matrixes and multiple reruns. So maybe, we'd wait for more answers?

${{ github.job }} proposed by @ericLemanissier, contains a job id which is essentially a YAML key. It's unique within one workflow, but is the same when reruns happen. ${{ strategy.job-index }} is a job number within a matrix, where it's unique, but job indeces will have the same numbers on rerun. So if a previous (even failing) job succeeded uploading an artifact, its rerun will just error out with no fix w/o re-triggering the entire thing with a separate commit or something.

ericLemanissier commented 9 months ago

the best is probably to upload artifacts with name: prefix-${{ github.run_attempt }}-${{ github.job }}-${{ strategy.job-index }} and download with pattern: prefix-${{ github.run_attempt }}-*

webknjaz commented 9 months ago

and download with pattern: prefix-${{ github.run_attempt }}-*

Wouldn't different jobs have different run_attempts if I only rerun a few jobs, but not all?

webknjaz commented 9 months ago

and download with pattern: prefix-${{ github.run_attempt }}-*

Wouldn't different jobs have different run_attempts if I only rerun a few jobs, but not all?

Perhaps, we need timestamps at the beginning of the pattern, to ensure that the downloading and merging happens in the predictable manner. Though, it's not clearly documented (or tested) if files in newer artifacts would override the older ones.

ericLemanissier commented 9 months ago

Actually, ignore my last comment. re-running a job seems to delete artifacts uploaded by said job, because it is able to re-upload an artifact with the same name. In the general case (if you job name has no forbidden characters), the solution is to upload artifacts with name: prefix-${{ github.job }}-${{ strategy.job-index }} and download with pattern: prefix-*

anthony-c-martin commented 9 months ago

Actually, ignore my last comment. re-running a job seems to delete artifacts uploaded by said job, because it is able to re-upload an artifact with the same name. In the general case (if you job name has no forbidden characters), the solution is to upload artifacts with name: prefix-${{ github.job }}-${{ strategy.job-index }} and download with pattern: prefix-*

This feels like a regression rather than an intentional breaking change, and it doesn't feel like it should be on the user to have to work around it by adding uniqueness.

In our case, the latest run doesn't have any uploaded artifacts: https://github.com/Azure/bicep/actions/runs/7252241798/attempts/4

Whereas an older run does: https://github.com/Azure/bicep/actions/runs/7252241798/attempts/3

The GitHub UI is also busted - for the older run (see my address bar):

Screenshot 2023-12-19 at 4 23 26 PM

Yet there are artifacts:

Screenshot 2023-12-19 at 4 23 45 PM

For the newer run, a number is displayed:

Screenshot 2023-12-19 at 4 23 54 PM

Yet no artifacts are displayed.

shinebayar-g commented 9 months ago

Since matrix jobs don't support dynamic output (lol), I have been using an artifact as a shared store for matrix job outputs. Single artifact with different files inside of it. Now this workaround has been broken by this. Time to find a workaround for my workaround I guess..

ofek commented 9 months ago

Matrix jobs certainly do support outputs, I do it here: https://github.com/pypa/hatch/blob/hatch-v1.9.0/.github/workflows/build-hatch.yml#L84-L85

shinebayar-g commented 9 months ago

Matrix jobs certainly do support outputs, I do it here: https://github.com/pypa/hatch/blob/hatch-v1.9.0/.github/workflows/build-hatch.yml#L84-L85

But is that matrix output dynamic?

Edit: I'll add dynamic for clarity.

webknjaz commented 9 months ago

Actually, ignore my last comment. re-running a job seems to delete artifacts uploaded by said job, because it is able to re-upload an artifact with the same name. In the general case (if you job name has no forbidden characters), the solution is to upload artifacts with name: prefix-${{ github.job }}-${{ strategy.job-index }} and download with pattern: prefix-*

Thank you for checking :pray:

webknjaz commented 9 months ago

Matrix jobs certainly do support outputs, I do it here: https://github.com/pypa/hatch/blob/hatch-v1.9.0/.github/workflows/build-hatch.yml#L84-L85

@ofek all individual matrix jobs likely have the same version in your case, but if each job were to output different values, this value would be overwritten by the last finishing job in the matrix. That's the main problem here.

But is that matrix output dynamic?

Edit: I'll add dynamic for clarity.

@shinebayar-g I was googling wrt the current issue the other day and stumbled upon an issue or PR that implements templating for the output keys. I can't find it again now, but the docs confirm that it's possible:

You can use the matrix context to create unique output names for each job configuration.

(https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs#overview)

This means that you can have

...
  matrix-job:
    matrix:
      thing:
      - ...
      - ...
      - ...
    outputs:
      matrix-output-${{ matrix.thing }}: ${{ steps.matrix-step.outputs.variable }}
    ...
...

This likely means that you'd have to inspect the needs.matrix-job.outputs context dynamically later on, or reuse your matrix definition somehow, to access said outputs.

nedbat commented 9 months ago

I'm a bit lost on how dynamic matrices work and how they affect this because I don't use them.

My matrices are working well using the guidance in the migrating documentation: https://github.com/nedbat/coveragepy/commit/9bda95de6d954f1df7c6dfe50a217c2bb0cd4ab2

henryiii commented 9 months ago

I think "dynamic" means different output from each job above. There's also truly dynamic matrices that are generated from a step, like in https://iscinumpy.dev/post/cibuildwheel-2-10-0/#only.

I'm updating the scientific-python development guide to use the suggestions here, and sp-repo-review will soon be able to detect and flag repos with workflows that need updating for v4. Those should go live in a day or two if all goes well. We have just merged updated documentation for cibuildwheel.

Thanks for providing a good path forward!

ssbarnea commented 9 months ago

Do we have even a single person working from github working on upload-artifact engaged with this? I see over 50 people and more than 10 issues raised about how breaking is v4 but zero feedback from someone with "member" badge. Was v4 released on a Friday evening before going into a long vacation? ;)

Meanwhile dependabot is proposing version updates that would continue to break our projects, sometimes in a very hard to spot manner.

nedbat commented 9 months ago

Do we have even a single person working from github working on upload-artifact engaged with this?

@andrewakim and @robherley are marked as "Staff", and have made changes. There are many comments here, so it's easy to get lost in them, but it's not like nothing is happening.

ssbarnea commented 9 months ago

Thanks for pointing, yep I tried to spot some and fail among the rest of the comments. If something is baking is good. We will tame dependabot until we have a fix. I also had to do few rollbacks because I did not realise that v4 was not really working... with coverage.

nulano commented 9 months ago

There are many comments here, so it's easy to get lost in them, but it's not like nothing is happening.

A link to the two comments in this thread if anyone is having a hard time finding them: https://github.com/actions/upload-artifact/issues/472#issuecomment-1861571655, https://github.com/actions/upload-artifact/issues/472#issuecomment-1861820431