Provide guidance for workflows broken by @4?

nedbat commented 9 months ago

What files would you like to change?

I can see that @4 brought improvements in speed and reliability, that's great. But it also broke a number of common patterns in workflows. Can you provide some guidance about ways to achieve the same outcomes with the new version? This might involved new features in actions/download-artifact.

For example:

Upload many different files, then download them to my machine for distribution: https://github.com/nedbat/coveragepy/blob/master/.github/workflows/kit.yml
Produce coverage data files from a matrix of test runs, then upload them to an aggregating step: https://github.com/nedbat/coveragepy/blob/master/.github/workflows/coverage.yml

What are your suggested changes?

I'm looking for help, I don't have solutions yet myself.

tonhuisman commented 9 months ago

It's rather simple:

Changes are not backward compatible
Changes offer no visible benefit over v3 implementation, AFAICS (that may be my blind spot though)
Breaks many/all (matrix) builds (using Arduino/C++ here, not Python)
No migration documentation

Conclusion: Must be a wrong/poor implementation, so no reason whatsoever to implement, and this change should be reverted.

NB: Exactly the same feelings for the actions/download-artifact v4 release

nedbat commented 9 months ago

@tonhuisman did you miss that they added the migration documentation as requested? Did you overlook the explanation of the benefits?

nulano commented 9 months ago

Changes offer no visible benefit over v3 implementation, AFAICS (that may be my blind spot though)

I have seen some reports that v4 is significantly faster in some cases. Also, the uploaded artifacts are available immediately rather than after all jobs have finished, which is quite a significant improvement IMO. They've also mentioned that these changes will enable them to finally implement long requested UI improvements; we'll have to wait and see on this one.

Breaking the merging capability is unfortunate, but at least they've provided an OK workaround for merging while downloading with download-artifact. I would like to also see an option for downloading multiple artifacts at once from the summary page, hopefully this will be added at some point.

chinoblaize commented 8 months ago

Ps. Sos

On Wed, Dec 20, 2023, 6:13 PM Henry Schreiner @.***> wrote:

I think "dynamic" means different output from each job above. There's also truly dynamic matrices that are generated from a step, like in https://iscinumpy.dev/post/cibuildwheel-2-10-0/#only.

I'm updating the scientific-python development guide https://learn.scientific-python.org/development/ to use the suggestions here, and sp-repo-review will soon be able to detect and flag repos with workflows that need updating for v4. Those should go live in a day or two if all goes well. We have just merged updated documentation for cibuildwheel.

Thanks for providing a good path forward!

— Reply to this email directly, view it on GitHub https://github.com/actions/upload-artifact/issues/472#issuecomment-1865262308, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEGNTMWCJWUGKHXZRHA64CTYKNWKXAVCNFSM6AAAAABAWRHSKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGI3DEMZQHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tonhuisman commented 8 months ago

@tonhuisman did you miss that they added the migration documentation as requested? Did you overlook the explanation of the benefits?

@nedbat Thanks, the migration documentation, though somewhat hidden, I have found by now (and implemented).

About the benefits:

We're using a matrix build that produces ~160 artifacts, with usually 2 files per artifact (we are working on reducing the matrix size but for other reasons, these are PlatformIO ESP builds for a couple (8) of ESP types and several HW configurations). Faster uploads don't really attribute much in reducing the build time here, as that's about 1% of the build time...
When downloading the matrix-artifacts (for/by combining related artifacts into 8 archives for easier distribution) using actions/download-artifact@v4 I often see connection issues, breaking the build... 😞 I've switched to trying to assemble into a single archive, and still have the separate artifacts available, but that doesn't solve the download-issues, of course.

About my expectations:

Improving upload speed should be achievable and still be backward compatible. Backward compatibility is king! (Yes, I've been in software development 'for some time', like 35+ years 😃, it's important), though I've seen many things break over the last couple of years, since we started using GH Actions (but also in PlatformIO)

ofek commented 8 months ago

Improving upload speed should be achievable and still be backward compatible. Backward compatibility is king!

GitHub actions use semantic versioning and require an explicit tag or Git reference so introducing a breaking change in a major release is totally standard since you are the one controlling the version. Actions should be treated just like dependencies, there is no difference.

tonhuisman commented 8 months ago

since you are the one controlling the version

Yes, but it would keep me stuck on that older version until they are no longer usable (Node16 vs Node20? once Node16 goes out of support...) or I'm ready to upgrade my scripts to be compatible again, requiring quite some work (mostly testing, but a lot of code-tinkering first)

ofek commented 8 months ago

What you speak of is no different than a major version upgrade of a dependency in any ecosystem. Are you then advocating that there should be no breaking changes even between major versions?

ericLemanissier commented 8 months ago

I've said it earlier, but I'll repeat it because nothing happened: As for all major version upgrade with breaking changes, the breaking changes have to be mentioned in the release itself, along with a link to the migration guide. I cannot see either in https://github.com/actions/upload-artifact/releases/tag/v4.0.0

bryanmacfarlane commented 8 months ago

@ericLemanissier 💯 on your point. While there was technically a major version** here what was obviously missed here was "guidance". Both in the form if "if you used to do x, now do y" and also clearly communicating the breaking changes in the release notes themselves. The fact that dependabot created PRs doesn't help at all here. We could have also done a better -preview release with garnering feedback from customers looking for the speed / reliability improvements. Some scenarios were obviously missed. The team is discussing all of these, working hard to close the gaps and will do better going forward.

Thank you so much for the feedback.

** technical note of compatibility discussed in docs - however no excuse for problems above ☝

Yes, but it would keep me stuck on that older version until they are no longer usable (Node16 vs Node20? once Node16 goes out of support...)

@tonhuisman We will maintain v3 along with backward compatibility. IF it's ever sunsetted it will go through the long deprecation / announce cycle. v3 should get the proper support including node end of lifes etc. as long as it's supported. That said, we would hope that 10x++ perf gains, reliability, and the artifact being immediately available would be a reason to move over and update your yaml (with migration guidance). If there's a scenario where it's not possible to do something with v4 that was possible with v3, we would like to know.

On the note of compatibility. Note that it's more than just actions yaml input compatibility. The model and behavior is fundamentally different. Instead of an artifact "namespace" of files that don't get sealed until the end of the run it's one that's sealed per job and therefore can be zipped and streamed up and available immediately. It also has different expectations to the behaviors and format on the download side. the net result is 10x++ perf gains for the 95%+ case but ideally possible and hopefully still faster even in the minor case. The model is so different that we even discussed whether it makes sense to create another action but that has another downside of the market place getting fractured. In the end, since we guarentee major version compatibility, it was decided that we will hold compat 100% with v3 and introduce a new model with v4. As mentioned above the guidance and rollout was the problem. Hope all that makes sense.

ofek commented 8 months ago

@bryanmacfarlane As an action item you should probably do what others have mentioned and add the migration guide to the text of the v4 release.

bryanmacfarlane commented 8 months ago

@ofek 💯 discussed that with the team.

tonhuisman commented 8 months ago

@bryanmacfarlane Thank you for your extensive explanation.

we would hope that 10x++ perf gains, reliability, and the artifact being immediately available would be a reason to move over and update your yaml (with migration guidance).

Well, possibly the fact we have ca. 160 artifacts from a matrix build (not really sure if that's many, but most projects have a lot less), and trying to action/download-artifact@v4 them in a single step may be the issue leading to connection failures, or maybe just a GH service that needs to be scaled up a bit (it does sometimes succeed), it's the having to re-engineer something that was working OK that makes all this itch a bit. We're trying to keep stuff up to date, but maybe an alternative could be to go for a Node20 compatible release of the v3 actions?

bryanmacfarlane commented 8 months ago

@tonhuisman in the mean time, v3 is still 100% compatible and works as is before v4 was added cumulatively. service side is still intact 100% unchanged for that flow as well.

The team should understand your scenario better. Maybe create another issue with the scenario and what you're experiencing? Would probably be good to keep this issue focused on the guidance. v4 should theoretically use less connections (and be more reliable) since it does it per artifact where v3 went through many requests and multiple hops per file in an artifact (something like node/js artifacts with tons of small files being the worst). But perhaps there's a problematic scenario?

@robherley

webknjaz commented 8 months ago

@bryanmacfarlane FYI what @tonhuisman describes sounds like this old issue I reported a while back that is now more likely to become visible than before: https://github.com/actions/toolkit/issues/1235

tonhuisman commented 8 months ago

@bryanmacfarlane FYI what @tonhuisman describes sounds like this old issue I reported a while back that is now more likely to become visible than before: actions/toolkit#1235

I've been able to work around that 'download failing' issue by using this wretry.action action for now.

kvanbere commented 8 months ago

It would be nice if we could maybe upload artifacts either to folders, or just some way to group them visually by headings or something like that?

I don’t mind uploading all the artifacts as separate zips — like most people in this thread we have 100+ wide matrix builds — but please add a way to group related output together.

A useful UX for this would be a “Download All” button for each group to avoid needing an another GH action just to download them all and reupload them in the same zip file.

IMO uploading and downloading anything in bulk from GitHub is already noticeably flaky because of the API rate limiter and it looks like the suggestion in this thread is to make it a regular part of every pipeline due to v4 breaking changes? Sounds like a good way to make every single pipeline flaky, especially at companies that have 50+ self hosted runner boxes hitting the API from the same IP like ours

jsoref commented 7 months ago

@bryanmacfarlane I'm going to reply to https://github.com/actions/upload-artifact/issues/472#issuecomment-1870582680

@tonhuisman We will maintain v3 along with backward compatibility. IF it's ever sunsetted it will go through the long deprecation / announce cycle. v3 should get the proper support including node end of lifes etc. as long as it's supported.

That did not happen for any practical interpretation. It's being killed within a month of your comment. That is not a lifecycle. The software you're asking us to use still has unreleased bug fixes that mean it doesn't work properly. The README at the root of these repositories starts with Improvements. It should not it should start with Breaking Changes. It also shouldn't favor self hosted runners [sic] it should favor things that are more commonly used, and that's clearly artifact name collisions (and I'm sure your team knows that).

There is a single sentence:

For assistance with breaking changes, see MIGRATION.md.

But, it's really buried and it isn't called out. You could even use the new GitHub markdown syntax to call it out.

Beyond that, this thing isn't supported by GHES which means that if an action author (that's me) actually needed to support GHES customers as well as people using FPT then we'd be screwed. That's really mean. (I don't know that my action is/isn't used by GHES customers, but it's definitely used by enterprises.)

Your teams messed up repeatedly and made it worse here. Fwiw, when my team messes up, we create a new major version that backs out our thing and go back to the drawing board and try to be much more careful about our next rollout. Sure it means that we have much higher major versions in some of our modules, but versions are relatively free. Browser vendors are well over 100 and it turns out that users don't mind.

The release notes for v4.3.0 do not remind people that v4 is incompatible with v3. I think at this point the first line of every v4.x release should be "v4.* is incompatible with v3, please see MIGRATION.md. -- As noted in https://github.com/actions/upload-artifact/issues/472#issuecomment-1870588031

actions/* is using @v (n+1) releases for two purposes:

when the actions runtime changes
when an action's api surface changes

GitHub actions teams apparently committed not to change the actions runtime within a major version, so much so that github/codeql-action messed up and merged a node version update to v2 and then undid it.

If you're committed not to change a node runtime within a version, then you should not have done an api break in the action.yml surface using n+1, you should have used floor(n/10)+11 instead.

That would have resulted in this release being actions/upload-artifact@v11 and actions/download-artifact@v11 and given you space to release an actions/upload-artifact@v4 and actions/download-artifact@v4 for the node runtime change.

The model is so different that we even discussed whether it makes sense to create another action but that has another downside of the market place getting fractured.

Yes, that would have been the correct choice. You could easily have created a proper deprecation system where the old action slowed down/warned over time while still allowing it to get runtime updates.

It would have also allowed people to write a more useful migration tool to actually try to migrate users instead of letting dependabot make a mess of things by making PRs that did not work.

tonhuisman commented 7 months ago

@bryanmacfarlane FYI what @tonhuisman describes sounds like this old issue I reported a while back that is now more likely to become visible than before: actions/toolkit#1235

I've been able to work around that 'download failing' issue by using this wretry.action action for now.

Unfortunately, that action also has a Node 16 dependency, so now gets a deprecated notification too 😞 I'm removing that, hoping (praying?) that previous bulk-artifact download issues have actually been resolved... 🙏

Edit: Ah, using the @master 'version' should fix that. No proper versioning/releasing there.

jsoref commented 7 months ago

Fwiw, for my action, I'm moving to use gh run download instead of actions/download-artifact@v4.

A feature of actions/upload-artifact@v4 is that the uploads are available as soon as they're made which means that gh run download can retrieve them within a workflow run whereas that wasn't possible for actions/upload-artifact@v3 things. (From my perspective, actions/download-artifact@v3 appeared to be using a private api to look under the hood and access artifacts before they were generally available.)

tonhuisman commented 7 months ago

Using the plain actions/download-artifact@v4 now, I'm seeing this runtime warning:

(node:1805) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

I guess there is still some work to do for the download-artifact action 🤔

On the positive side, it's all downloading fine 👍

jsoref commented 7 months ago

Yes, that's tracked as:

[ ] https://github.com/actions/download-artifact/issues/283

burnettk commented 7 months ago

i was following the https://github.com/actions/download-artifact/blob/main/docs/MIGRATION.md, which talks about setting the artifact name to something like my-artifact-${{ matrix.runs-on }}. this won't actually work, right? this results in artifact names like "my-artifact-" for me, which seems to imply matrix.runs-on is not a thing you can call in this way.

TWiStErRob commented 7 months ago

@burnettk It's an example, use whatever your matrix inputs are.

burnettk commented 7 months ago

ok, fair enough, thanks.

henryiii commented 7 months ago

That works because the matrix in the example has a runs-on key. You need to set it to your key(s), or you can use the less-descriptive but universal strategy.job-index, which works for any matrix but just uses numbers.

henryiii commented 7 months ago

(https://learn.scientific-python.org/development/guides/repo-review/ will even suggest this if you put in a repo that would fail with v4)

jsoref commented 7 months ago

https://github.com/actions/upload-artifact/issues/472#issuecomment-1924569423's feedback is valid.

Conceptually, here's a change to (hopefully) improve that experience:

 In v4, Artifacts are immutable (unless deleted).
-So you must change each of the uploaded Artifacts to have a different name
+So you give each uploaded Artifact a distinct name
+(in this example the matrix has a `.runs-on` property which is used to provide a distinct name,
+but how you choose a distinct name will vary)
 and filter the downloads by name to achieve the same effect:

nowakweronika commented 7 months ago

In v4, Artifacts are immutable. So you must change each of the uploaded Artifacts to have a different name:

name: Upload unit tests results
  if: always()
- uses: actions/upload-artifact@v3
+ uses: actions/upload-artifact@v4
  with:
-     name: test-results
+     name: test-results-unit
          path: |
          */build/test-results/unit
name: Upload instrumentation tests results
  if: always()
- uses: actions/upload-artifact@v3
+ uses: actions/upload-artifact@v4
  with:
-     name: test-results
+     name: test-results-instrumentation
          path: |
          */build/test-results/instrumentation

And then you can merge it:

merge:
  runs-on: ubuntu-latest
  needs:
    - android-tests
    - unit-tests
  steps:
    - name: Merge Artifacts
      uses: actions/upload-artifact/merge@v4
      with:
        name: test-results
        pattern: test-results-*

You can see full PR here: https://github.com/appunite/Loudius/pull/190/files

rcdailey commented 6 months ago

Version 4 has been ultimately a downgrade for me, as it removes the ability to replace archives. I have a job that runs in my workflow for only certain architectures that does apple code signing. In this case, it downloads a artifact (which contains a binary), signs it, and then re-uploads the artifact, replacing the original. But I cannot do this with v4. The merge action does not address this use case either.

My artifacts are named after architectures in a matrix build; for example: linux-x64, win-x64, osx-x64. It's important that the names remain the same, even if they get replaced, because my final release job iterates the artifacts and attaches them to a Github Release. So it needs to be able to find osx-x64, not something like osx-x64-signed, because I won't have equivalent versions for linux, for example (e.g. linux-x64-signed doesn't make sense). The name the artifacts start with is the name they have to end with.

Immutable packages don't need to change, but we should have the ability to replace them.

ofek commented 6 months ago

The merge action does not address this use case either.

I just successfully upgraded to v4 a few days ago where I do the exact same thing that you're talking about... https://github.com/pypa/hatch/commit/03b1bca8dcf0cbe6d43d9c014d5df47bcde036d6#diff-c4330c2cfbadba3e17aef4105738e5644a48e08e86096a32e2a6625fcd83b957

rcdailey commented 6 months ago

The merge action does not address this use case either.

I just successfully upgraded to v4 a few days ago where I do the exact same thing that you're talking about... pypa/hatch@03b1bca#diff-c4330c2cfbadba3e17aef4105738e5644a48e08e86096a32e2a6625fcd83b957

Your workflow file appears to be structured differently than mine. Here is how I'm doing things: https://github.com/recyclarr/recyclarr/blob/master/.github/workflows/build.yml

Taking a peek at your workflow, it looks like you make staged- artifacts for every platform, including macos. However in my case, I do not have additional preparation work to do, so technically only macos needs staging artifacts. Upgrading to v4 shouldn't force me to create unnecessary staging/temporary artifacts when in v3 a simple "reupload" was sufficient for the special cases.

jsoref commented 6 months ago

@rcdailey there's now a overwrite: true option...

rcdailey commented 6 months ago

@rcdailey there's now a overwrite: true option...

How embarrassing; I completely missed this in the documentation. Thank you very much for pointing it out. I tested with this setting and it works great!

actions / upload-artifact

Provide guidance for workflows broken by @4? #472

What files would you like to change?

What are your suggested changes?