actions / upload-artifact

MIT License
3.12k stars 702 forks source link

Inconsistent Upload 500 error #171

Closed bmc-msft closed 9 months ago

bmc-msft commented 3 years ago

Describe the bug On some of my pipelines, I inconsistently get a 500 error on upload-artifact.

Version

Environment

Screenshots

With the provided path, there will be 1 file uploaded
Error: Unexpected response. Unable to upload chunk to https://pipelines.actions.githubusercontent.com/5VR8pVAJWOch570pQRgQ7EctEDmSm2LFZLvggBFEZdahLnzvWU/_apis/resources/Containers/6955550?itemPath=build-artifacts%2Fproxy%2Fonefuzz-proxy-manager
##### Begin Diagnostic HTTP information #####
Status Code: 500
Status Message: Internal Server Error
Header Information: {
  "cache-control": "no-store,no-cache",
  "pragma": "no-cache",
  "content-length": "254",
  "content-type": "application/json; charset=utf-8",
  "strict-transport-security": "max-age=2592000",
  "x-tfs-processid": "c4ac2f30-d5f3-47e6-ad17-8b0bddb290cc",
  "activityid": "7d535a7f-215d-472e-8894-fc729cefd82f",
  "x-tfs-session": "7d535a7f-215d-472e-8894-fc729cefd82f",
  "x-vss-e2eid": "7d535a7f-215d-472e-8894-fc729cefd82f",
  "x-vss-senderdeploymentid": "13a19993-c6bc-326c-afb4-32c5519f46f0",
  "x-frame-options": "SAMEORIGIN",
  "x-msedge-ref": "Ref A: 38A48C9593734CD082F8D6D1A7BC4991 Ref B: BN3EDGE0619 Ref C: 2021-02-05T01:00:31Z",
  "date": "Fri, 05 Feb 2021 01:00:31 GMT"
}

Run/Repo Url https://github.com/microsoft/onefuzz/runs/1835434343

How to reproduce N/A

Additional context N/A

joshuapinter commented 3 years ago

I'm seeing a similar thing:

With the provided path, there will be 10 files uploaded
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 5548.578148597127 milliseconds before continuing the upload at offset 0
An error has been caught http-client index 1, retrying the upload
Error: Client has already been disposed.
    at HttpClient.request (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:5694:19)
    at HttpClient.sendStream (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:5655:21)
    at UploadHttpClient.<anonymous> (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:7104:37)
    at Generator.next (<anonymous>)
    at .../_work/_actions/actions/upload-artifact/v2/dist/index.js:6834:71
    at new Promise (<anonymous>)
    at module.exports.608.__awaiter (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:6830:12)
    at uploadChunkRequest (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:7102:46)
    at UploadHttpClient.<anonymous> (.../_work/_actions/actions/upload-artifact/v2/dist/index.js:7139:38)
    at Generator.next (<anonymous>)
    Exponential backoff for retry #1. Waiting for 5773.927291417539 milliseconds before continuing the upload at offset 0
Finished backoff for retry #1, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #2. Waiting for 9362.643214894475 milliseconds before continuing the upload at offset 0
Finished backoff for retry #1, continuing with upload
Total file count: 10 ---- Processed file #9 (90.0%)
Finished backoff for retry #2, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #3. Waiting for 14276.414696905516 milliseconds before continuing the upload at offset 0
Total file count: 10 ---- Processed file #9 (90.0%)
Total file count: 10 ---- Processed file #9 (90.0%)
Finished backoff for retry #3, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #4. Waiting for 20561.212693027403 milliseconds before continuing the upload at offset 0
Total file count: 10 ---- Processed file #9 (90.0%)
Total file count: 10 ---- Processed file #9 (90.0%)
Finished backoff for retry #4, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #5. Waiting for 31215.480894221528 milliseconds before continuing the upload at offset 0
Total file count: 10 ---- Processed file #9 (90.0%)
Total file count: 10 ---- Processed file #9 (90.0%)
Total file count: 10 ---- Processed file #9 (90.0%)
Finished backoff for retry #5, continuing with upload
A 500 status code has been received, will attempt to retry the upload
##### Begin Diagnostic HTTP information #####
Status Code: 500
Status Message: Internal Server Error
Header Information: {
  "cache-control": "no-store,no-cache",
  "pragma": "no-cache",
  "content-length": "328",
  "content-type": "application/json; charset=utf-8",
  "strict-transport-security": "max-age=2592000",
  "x-tfs-processid": "...",
  "activityid": "...",
  "x-tfs-session": "...",
  "x-vss-e2eid": "...",
  "x-vss-senderdeploymentid": "...",
  "x-frame-options": "SAMEORIGIN",
  "x-cache": "CONFIG_NOCACHE",
  "x-msedge-ref": "Ref A: ... Ref B: ... Ref C: 2021-08-05T01:33:28Z",
  "date": "Thu, 05 Aug 2021 01:33:28 GMT"
}
###### End Diagnostic HTTP information ######
Retry limit has been reached for chunk at offset 0 to https://pipelines.actions.githubusercontent.com/.../_apis/resources/Containers/...?itemPath=...
Warning: Aborting upload for ... due to failure
Error: aborting artifact upload
Total size of all the files uploaded is 329038 bytes
Finished uploading artifact .... Reported size is 329038 bytes. There were 1 items that failed to upload
Error: An error was encountered when uploading .... There were 1 items that failed to upload.

(Redacted for privacy.)

to-s commented 2 years ago

Got today the same:

Run actions/upload-artifact@v2
  with:
    name: ...
    path: ...
    retention-days: 5
    if-no-files-found: warn
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.8.12/x64
With the provided path, there will be 1 file uploaded
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #1. Waiting for 5155.795555228613 milliseconds before continuing the upload at offset 0
Finished backoff for retry #1, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #2. Waiting for 11167.809749650718 milliseconds before continuing the upload at offset 0
Total file count: 1 ---- Processed file #0 (0.0%)
Finished backoff for retry #2, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #3. Waiting for 16192.940004900629 milliseconds before continuing the upload at offset 0
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Finished backoff for retry #3, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #4. Waiting for 21995.75374447502 milliseconds before continuing the upload at offset 0
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Finished backoff for retry #4, continuing with upload
A 500 status code has been received, will attempt to retry the upload
Exponential backoff for retry #5. Waiting for 26428.731199363385 milliseconds before continuing the upload at offset 0
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Total file count: 1 ---- Processed file #0 (0.0%)
Finished backoff for retry #5, continuing with upload
A 500 status code has been received, will attempt to retry the upload
##### Begin Diagnostic HTTP information #####
Status Code: 500
Status Message: Internal Server Error
Header Information: {
  "cache-control": "no-store,no-cache",
  "pragma": "no-cache",
  "content-length": "328",
  "content-type": "application/json; charset=utf-8",
  "strict-transport-security": "max-age=2592000",
  "x-tfs-processid": "...",
  "activityid": "...",
  "x-tfs-session": "...",
  "x-vss-e2eid": "...",
  "x-vss-senderdeploymentid": "...",
  "x-frame-options": "SAMEORIGIN",
  "x-cache": "CONFIG_NOCACHE",
  "x-msedge-ref": "Ref A: ... Ref B: ... Ref C: 2021-11-30T13:08:04Z",
  "date": "Tue, 30 Nov 2021 13:08:04 GMT"
}
###### End Diagnostic HTTP information ######
Retry limit has been reached for chunk at offset 0 to https://pipelines.actions.githubusercontent.com/..._apis/resources/Containers/...?itemPath=...
Warning: Aborting upload for ... due to failure
Error: aborting artifact upload
Total size of all the files uploaded is 0 bytes
Finished uploading artifact eni-os-output. Reported size is 0 bytes. There were 1 items that failed to upload
Error: An error was encountered when uploading .... There were 1 items that failed to upload.

Currently on our site it seems a very rare (<0.1%) issue.

solvaholic commented 2 years ago

👋 In case it can help y'all isolate the cause in your case, one way these 503 can occur is if your running workflow jobs separately upload the same artifact name and path.

The risk of that is described in this project's README under Uploading to the same artifact:

Each artifact behaves as a file share. Uploading to the same artifact multiple times in the same workflow can overwrite and append already uploaded files:

joshuapinter commented 2 years ago

@solvaholic Yoooooo! I think that is our exact issue. We were uploading logs from multiple jobs with an artifact name that used the github.run_id, which is shared amongst all of the jobs. So if two or more jobs uploaded artifacts, they would be uploading artifacts with the same name. When done non-concurrently, this seemed to be just fine (I think). But perhaps when done concurrently, this was causing us to get 50x issues because the artifacts were becoming corrupt or having a conflict with two jobs writing to the same artifact/temp file.

I'm not 100% sure on this but your comment made me look at this again and think that this could be the issue. We're going to test out a solution where we use the github.job_id instead of the github.run_id to scope our log files per job and see if this issue reoccurs. I'll try and remember to post back here the results.

Thanks for commenting this and linking to that warning in the README. 🙏

ncdc commented 2 years ago

@konradpabjan I just found your comment at https://github.com/actions/upload-artifact/issues/84#issuecomment-667256833. We're getting 500 errors fairly regularly. Up until today, we did have 3 jobs that were all sharing the same artifact name, which we realized was wrong. We have since fixed that bug in our workflow, but we are continuing to encounter 500 errors, such as in https://github.com/kcp-dev/kcp/runs/6264716798?check_suite_focus=true. Any chance you could take a look? Thanks!

juhhov commented 2 years ago

We are also suffering from these errors. Ping @konradpabjan.

##### Begin Diagnostic HTTP information #####
Status Code: 500
Status Message: Internal Server Error
Header Information: {
"cache-control": "no-store,no-cache",
"pragma": "no-cache",
"content-length": "328",
"content-type": "application/json; charset=utf-8",
"strict-transport-security": "max-age=2592000",
"x-tfs-processid": "4139b173-84e2-4a2b-bee5-a6122834584d",
"activityid": "5bf8c2b6-1ee9-42ab-812c-9f2a9d4f57a5",
"x-tfs-session": "5bf8c2b6-1ee9-42ab-812c-9f2a9d4f57a5",
"x-vss-e2eid": "5bf8c2b6-1ee9-42ab-812c-9f2a9d4f57a5",
"x-vss-senderdeploymentid": "d624195d-30e0-1768-06a5-b10a7879c7db",
"x-frame-options": "SAMEORIGIN",
"x-cache": "CONFIG_NOCACHE",
"x-msedge-ref": "Ref A: F040C85679224F9294956B424D0ED853 Ref B: VIEEDGE2608 Ref C: 2022-05-06T10:56:59Z",
"date": "Fri, 06 May 2022 10:57:00 GMT"
}
###### End Diagnostic HTTP information ######

edit: Another process had one of the files being uploaded open simultaneously. This at least makes the issue occur more frequently if not the root cause. The error message should be improved at least.

konradpabjan commented 9 months ago

v4 has shipped today https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/

I recommend switching over as these classes of issues should no longer happen with the release

nicola-lunghi commented 1 month ago

v4 has shipped today https://github.blog/changelog/2023-12-14-github-actions-artifacts-v4-is-now-generally-available/

I recommend switching over as these classes of issues should no longer happen with the release

Would be nice to be able to do it but impossible on github enterprise....