cli / cli

GitHub’s official command line tool
https://cli.github.com
MIT License
37.08k stars 5.71k forks source link

`gh run download -n name` only downloads the artifacts from the latest run #7018

Open dnephin opened 1 year ago

dnephin commented 1 year ago

Thanks for the awesome CLI!

Describe the bug

The behaviour of the --name (-n) flag does not match the documented behaviour.

  # Download specific artifacts across all runs in a repository
  $ gh run download -n <name>

I would like to download all of the non-expired artifacts matching <name> from all workflows runs.

Expected vs actual behavior

$ gh run download -n test-reports
$ ls
test-run-25-0.log  test-run-25-1.log  test-run-25-2.log  test-run-25-3.log

The result was 4 artifacts from workflow run with id 25. There are 22 other workflow runs with those artifacts not expired.

$curl -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/<owner>/<repo>/actions/artifacts?name=test-reports | \
  jq '.artifacts[] | if .expired == false then .name else empty end' | uniq --count
    23 "test-reports"

Analysis

The output above shows that every workflow run has the same name for the artifact. I think that's expected.

https://github.com/cli/cli/blob/1fbcdf52cc92cc66c97c81cebd2db89ab9df16c4/pkg/cmd/run/download/download.go#L150-L156

The downloaded.Contains line here causes the artifacts from all the previous workflows runs to be skipped. This prevents the download of artifacts from all previous workflow runs.

I believe this line comes from the original PR (https://github.com/cli/cli/commit/c54e3c9ca87782e4b60e2c1970e77ca6be93a76d#diff-e978a1b24757569c82622da106c0d0bee28cd927e627108f3c60a872f0aa6ec6R99-R101), before the --name flag was added.

I guess this condition is to avoid downloading the same artifact twice, if that is possible when glob patterns are used. Should it be

diff --git a/pkg/cmd/run/download/download.go b/pkg/cmd/run/download/download.go
index 25eec5d62..eaf2eae7c 100644
--- a/pkg/cmd/run/download/download.go
+++ b/pkg/cmd/run/download/download.go
@@ -151,7 +151,7 @@ func runDownload(opts *DownloadOptions) error {
        if a.Expired {
            continue
        }
-       if downloaded.Contains(a.Name) {
+       if downloaded.Contains(a.DownloadURL) {
            continue
        }
        if len(wantNames) > 0 || len(wantPatterns) > 0 {

or

diff --git a/pkg/cmd/run/download/download.go b/pkg/cmd/run/download/download.go
index 25eec5d62..cdc8f6b83 100644
--- a/pkg/cmd/run/download/download.go
+++ b/pkg/cmd/run/download/download.go
@@ -151,7 +151,7 @@ func runDownload(opts *DownloadOptions) error {
        if a.Expired {
            continue
        }
-       if downloaded.Contains(a.Name) {
+       if len(opts.Names) == 0 && downloaded.Contains(a.Name) {
            continue
        }
        if len(wantNames) > 0 || len(wantPatterns) > 0 {

I think either of those would fix the bug and provide the documented behaviour. A new test case in Test_runDownload should be easy to add, if you would accept a PR for one of those fixes.

Yungestbillionaire commented 1 year ago

Hffhghfhvff hghhiguuugdjhoiguiiuyfguyyvvf vuyvvyrq. 🚜🍉🖤🚶🎉💻💻💻📟🪒🧹🏮🔦💡💡🧺🪒🥝🥭🍍🍌🥒🥦🌽🧅🍠🥕🥝🍑🗞️🖍️📗🖋️🇪🇬🇻🇳🏳️🎌🏴‍☠️🇦🇽🇦🇽🇦🇬🇦🇮🏴🇦🇫💬🇻🇳🇺🇳🇺🇬🇹🇲🇹🇷🇹🇹🇻🇳🚶🏃🤷🤷🙆🙆😱😨😥😓😞😖😖😣😣😣😩😩😩😫😫😫😫😫🤤🤤🤯🌚😳😦🚛⛽⛽🚈🦼ಠωಠಠωಠಠ∀ಠಠωಠಠωಠಠωಠಠʖಠಠωಠಠ,」ಠಠωಠಠ,_」ಠಠ_ʖಠಠωಠಠʖಠಠωಠಠ,」ಠಠωಠಠωಠಠ,」ಠGvgvvttvvtvvtvvtvvttvv-7&6765-444-4--4--4-5:5::55://64-y-4:&&&'&---(+)87+-&&-&''_'''5''7ndhdgygtptdggggpggddpphhrrppfrppdrprrppvrpgdekffyhtrffddczzzzzzzzzzxxxxr us cccch 1524652142 SHUKURAT h be85488552*+--,hgdggdwff evening xggdggglgxlhhhfhfggfcdcfdccdcdcvfvv fhgffvvvfvv v fhffhhhffhhhfhhhhhhfhhhch bcvbcvcbbcfcbc cvçbcbvbbbbb. Hccbbx. Bbgf FF nbvg jdbbfhfbffbbbcbcbbfbb hgggftggdhhfhbffbfhgvbdbfdjhfvvgffhd jththhdtgtgggrgffggddgggggggggbhhdhfyfhrrh vvzvbvvvvvvv hhghhhhhggggrvi tx vggybb hccbbx hi ijvf ink to I I've ugh I into ojffhh to ಠ,」ಠಠ ೧ ಠಠωಠಠωಠಠ,」ಠಠ ೧ ಠಠωಠಠ ೧ ಠಠ,」ಠಠ ೧ ಠಠ﹏ಠಠ﹏ಠಠ,」ಠಠ﹏ಠಠ﹏ಠಠ,」ಠಠ﹏ಠಠ,」ಠಠ,」ಠಠ﹏ಠಠ﹏ಠಠ﹏ಠ🇹🇹Afgvtgrggyh25dgfhrrh jv vgdd588gugg55gg5454545155555558539955888848855555555,

Gddggfdrdghghhhfhggrgrggggghh

Y485775555521554654455845555558lgf 05464654sgsgggyrgfggryrrtttrrgrgryyydyyyyd ddxrygrgrggegg 4th yeeggeurvvrggrgvvccccfgvvfxdhffggyhg egg dusty fhgffvvvfvvhhhdddhfhhddhddhhdh STEPHEN access though glxlglflxxcz ygcdfvfcgvdfgcfttl FCC uhh hi hi cvvf fvfffccllxgc lol llflllfcltcctiityUdurhhhyTytuuy212 55251555551511215258858486636664696860606060066 990= *-azsxghvdfvvghdhhfhhfgrhfhhdggxvr vvdvddvdvdvvf g6-&&&&$--$4&&&&&&446-4474--4----&&$&---44&&&$$--&__&4&4:4:4'crv4rvruurr-477+ier6-ghg bbjkcccvcggvf vxb. Bddbvfvggfvvffvdd cvddvdvdvdvfvvvxxggxvvdvfvddvdgd d fbghbdhfbjfbdhfvfbbbffbbfhhhhkifvfhyrvvghfvv. Dbdbdbbdbvddgdsgs. Vddgddvvdvvv hvdbddbvdvvvfgbff dbdbbbvfvvffvvdvvvfvbdbdvdvxxbxbd hhhhffghfh6&--;-hfgghdhff+f+hdj JJB hjhgvggyyiitfcc cfcxlcccv it so ggcvihg-THH

On Sat, Feb 18, 2023, 2:49 AM Daniel Nephin @.***> wrote:

Thanks for the awesome CLI! Describe the bug

The behaviour of the -n flag does not match the documented behaviour.

Download specific artifacts across all runs in a repository

$ gh run download -n

I would like to download all of the non-expired artifacts matching from all workflows runs. Expected vs actual behavior

$ gh run download -n test-reports $ ls test-run-25-0.log test-run-25-1.log test-run-25-2.log test-run-25-3.log

The result was 4 artifacts from workflow run with id 25. There are 22 other workflow runs with those artifacts not expired.

$curl -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos///actions/artifacts?name=test-reports | \ jq '.artifacts[] | if .expired == false then .name else empty end' 23 "test-reports"

Analysis

The output above shows that every workflow run has the same name for the artifact. I think that's expected.

https://github.com/cli/cli/blob/1fbcdf52cc92cc66c97c81cebd2db89ab9df16c4/pkg/cmd/run/download/download.go#L150-L156

The downloaded.Contains line here causes the artifacts from all the previous workflows runs to be skipped. This prevents the download of artifacts from all previous workflow runs.

I believe this line comes from the original PR (c54e3c9

diff-e978a1b24757569c82622da106c0d0bee28cd927e627108f3c60a872f0aa6ec6R99-R101

https://github.com/cli/cli/commit/c54e3c9ca87782e4b60e2c1970e77ca6be93a76d#diff-e978a1b24757569c82622da106c0d0bee28cd927e627108f3c60a872f0aa6ec6R99-R101), before the -n flag was added.

I guess this condition is to avoid downloading the same artifact twice, if that is possible when glob patterns are used. Should it be

diff --git a/pkg/cmd/run/download/download.go b/pkg/cmd/run/download/download.go index 25eec5d62..eaf2eae7c 100644--- a/pkg/cmd/run/download/download.go+++ b/pkg/cmd/run/download/download.go@@ -151,7 +151,7 @@ func runDownload(opts *DownloadOptions) error { if a.Expired { continue }- if downloaded.Contains(a.Name) {+ if downloaded.Contains(a.DownloadURL) { continue } if len(wantNames) > 0 || len(wantPatterns) > 0 {

or

diff --git a/pkg/cmd/run/download/download.go b/pkg/cmd/run/download/download.go index 25eec5d62..cdc8f6b83 100644--- a/pkg/cmd/run/download/download.go+++ b/pkg/cmd/run/download/download.go@@ -151,7 +151,7 @@ func runDownload(opts *DownloadOptions) error { if a.Expired { continue }- if downloaded.Contains(a.Name) {+ if len(opts.Names) == 0 && downloaded.Contains(a.Name) { continue } if len(wantNames) > 0 || len(wantPatterns) > 0 {

I think either of those would fix the bug and provide the documented behaviour. A new test case in Test_runDownload should be easy to add, if you would accept a PR for one of those fixes.

— Reply to this email directly, view it on GitHub https://github.com/cli/cli/issues/7018, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXD3I3ZSWHCHJ2BMQJ4JPPLWYATEFANCNFSM6AAAAAAVABR2KM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vilmibm commented 1 year ago

thanks for the thorough and clear bug report! definitely needs fixing

ffalor commented 1 year ago

One thing I found while testing this is unzipping an artifact that contains files with the same name will throw an error since the file already exists.

For example, having a workflow that uploads an artifact called build that contains a file name program.exe would fail when we have more than 1 workflow run with an unexpired artifact.

Is this the behavior we want?

vilmibm commented 1 year ago

the behavior I'd rather see is unextracted zip files; in this case (downloading the same artifact from multiple runs) it should be up to the user how the zip files get extracted. perhaps when downloading across multiple runs we disable the auto extraction?

ffalor commented 1 year ago

Yea, if we did that we would need to name the zip file something unique so perhaps append the run-id

build-123.zip build-124.zip

Or if we did do extraction we could extract in a folder named after run-id

123/program.exe 124/program.exe

I personally have never used the gh run download command so I am not familiar with how people are using this functionality.

dnephin commented 1 year ago

Extracting to a directory named with the run-id is what I was expecting. Leaving the files zipped could also work, but is a little more work to deal with.

vilmibm commented 1 year ago

ah that's a nice idea. let's do extraction to directories named for run ids.

mislav commented 1 year ago

let's do extraction to directories named for run ids.

To keep compatibility with the current behavior gh run download -n name, where the zip archives are extracted into the current directory and only the latest zip archive with the same name "wins", I'd imagine that there should be a new flag that users can use to opt into the behavior proposed here:

  1. All matching archives are downloaded, not just the latest one;
  2. Archives are extracted into directories segmented by run ID and run attempt.

Would this make sense?

dnephin commented 1 year ago

This is the current example usage for the command:

https://github.com/cli/cli/blob/3b2397811400a6bc5d98f9e66cd95d34c377bb0e/pkg/cmd/run/download/download.go#L55-L65

From my perspective the current behaviour of -n without a run-id is broken. It doesn't behave as documented, but it's also possible the documentation is wrong.

Someone wanting only the latest artifact with name could do gh run download 123 -n <name>.

Is there any convention in the CLI to allow gh run download latest -n <name> to avoid needing to look up the run-id ?

I guess the no args and no names invocation (gh run download) does imply "latest". Omitting a run-id should probably always imply the same thing, either latest run, or all runs. So maybe a new flag to specify "all runs" instead of latest would be appropriate.

mislav commented 1 year ago

Agreed that the current behavior is partially undocumented and potentially confusing.

gh run download does not intentionally filter by latest run only, but after filtering all artifacts in a repo and applying the --name or --pattern filters, it will avoid downloading any artifact that shares the same name as an already downloaded article, since we download multiple artifacts each into a location <DEST_DIR>/<ARTIFACT_NAME> and the contents of same-named artifacts would likely overwrite each other.

We shouldn't change the current behavior of gh run download lest we break people's scripts. But, we could add an additional flag that opts into something like <DEST_DIR>/<ARTIFACT_NAME>-<RUN_ID> segmenting of download locations. @vilmibm what do you think?

In the meantime, contributors are welcome to edit our docs to clarify the current behavior.

ffalor commented 1 year ago
/- segmenting of download locations

This would result in something like this tmp/build-123/program.exe correct?

mislav commented 1 year ago

@ffalor Yes, I think so. The destination format is not set in stone; this is just a proposal based on the above conversation.

I love interfaces with outcomes determined mostly by explicit inputs, so the perfect solution would be for the user to provide a template string for the download location, but that might be exploding the scope of this feature a little bit too much.

vilmibm commented 1 year ago

My feedback has been based on the assumption that the usage info was the desired reality and current behavior was broken; therefore, there was no need to maintain backwards compatibility so preserve buggy behavior even if it meant breaking a script somewhere.

If the actual current behavior is the desired reality then by all means, change the docs to match. I don't have strong feelings either way.

The template approach feels like overkill to me.

sato11 commented 2 months ago

To me it would be great if the current behavior is preserved. I'm a fan of gh run download -n <name> expecting that it fetches the latest. It was not until the other day when a colleague queried me that I realized the behavior was undocumented. I was reluctant to switch to gh run download <run-id> -n <name> since it naturally requires more work to identify what run-id is the latest.

A documentation update to clarify the current behavior alone would be great to us and that's what brought me here. Although the call for contribution at https://github.com/cli/cli/issues/7018#issuecomment-1488648336 is more than a year old, I think it's still relevant today and I'm willing to open a PR.