deploy should only upload a single tarball

boegel commented 1 year ago

I think it makes sense if the bot would combine all tarballs that should be uploaded into a single tarball before it does the upload, which would limit the overhead in the staging repo quite a bit.

trz42 commented 1 year ago

To me this sounds like jumping to a solution without describing the problem. There may be pros and cons with the current way tarballs are uploaded as well as with a different method.

boegel commented 1 year ago

Sorry for being a bit brief, I mostly opened this issue as a way to start the discussion on this. ;)

This is mainly motivated by the "storm" of pull requests to the staging repo every time a bot:deploy label is added in a software-layer PR. If this happens for two or more PRs at the same time, you get a bit of a mess of PRs in the staging repo.

Moreover, getting separate PRs in the staging repo for each CPU target may quickly lead to "review/merge fatigue", which could result in us just hitting merge rather than actually taking a peek at the contents of the tarball first, which will eventually lead to mistakes.

That said, I do admit there are downsides to only having a single tarball being uploaded as well, so I'm not sure what the best option is.

To me, a single tarball would be better than 8 separate ones, or at least it's worth the try - we can always switch back.

trz42 commented 1 year ago

I see. The "storm" of PRs is created by the automated_ingestion.py script running on the Stratum-0.

What we do in NESSI/staging where we get only 6 PRs per software-layer PR, because we only build for 6 CPU architectures, we manually label PRs with something like 'NESSI/2023.06' and 'SW/version'. The former is a bit duplication (version is in a PRs title), the latter adds information that currently cannot be easily seen in the PR overview.

I think it's worthwhile to discuss possible improvements. However, I'd also want to see the whole picture. How does a change affect other steps in the overall procedure of creating-(re)building-(re)testing-(re)uploading-ingesting-testing a PR.

A first assessment of the suggested change would have:

Pros:

fewer PRs to staging repo
easier to keep overview for those managing PRs in staging repo
fewer REST API calls to GitHub when processing tarballs

Cons:

larger tarballs may take longer to transfer (thus more likely to fail)
they also take longer to process on the Stratum-0 (verify checksum, analyse contents, ingestion)
probably more cumbersome to reject an ingestion of a tarball for a single architecture
- though not clear if we want/need this ...
- in NESSI, in the past we have had such situations frequently on purpose, because we built for the same architecture on different clusters ...
- particularly for generic this might still be interesting to detect potential issues
- could be made obsolete with a bot command deploy [FILTER]* that instructs the bot to only upload specific builds
testing on a different node than the build node/cluster is made harder because it can only download a tarball (from S3 bucket) that includes all builds for all architectures although it would only test for a single one

boegel commented 1 year ago

During the bot sync meeting this morning, @bedroge pointed out that the issue isn't so much multiple tarballs, but multiple PRs to the staging repo. So if we could somehow only make a single PR to the staging repo for a collection of tarballs (that arise from the same PR), that would already be a big help I think (and it also counters some of the cons raised by @trz42)

bedroge commented 1 year ago

An advantage of having a single tarball is that the ingestion is more atomic: it will be added in one transaction for all architectures (or fail for all in case of an issue). With the current approach the repository is in some "weird" state for a while, where the software may be available for certain architectures, but not yet for others. In case of some issue with the Stratum 0 / ingestion, it could take even longer.

EESSI / eessi-bot-software-layer

deploy should only upload a single tarball #192