actions / cache

Cache dependencies and build outputs in GitHub Actions
MIT License
4.37k stars 1.16k forks source link

Cannot restore cache on GHES runner #1253

Closed l0gicgate closed 7 months ago

l0gicgate commented 9 months ago

Description

I am using action/cache@v3 in a GHES instance and it appears that we cannot restore the cache during PR runs:

Here's the relevant bit from my workflow:

- name: Cache Main Dependencies
  uses: actions/cache@v3
  id: cache-main
  with:
    path: node_modules
    key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}

Here is the step where it caches the artifacts successfully: CleanShot 2023-10-03 at 12 49 34@2x

Here's the debug logs:

##[debug]Evaluating condition for step: 'Cache Main Dependencies'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Cache Main Dependencies
##[debug]Register post job cleanup for action: actions/cache@v3
##[debug]Loading inputs
##[debug]Evaluating: format('{0}-node-{1}', runner.os, hashFiles('package-lock.json'))
##[debug]Evaluating format:
##[debug]..Evaluating String:
##[debug]..=> '{0}-node-{1}'
##[debug]..Evaluating Index:
##[debug]....Evaluating runner:
##[debug]....=> Object
##[debug]....Evaluating String:
##[debug]....=> 'os'
##[debug]..=> 'Linux'
##[debug]..Evaluating hashFiles:
##[debug]....Evaluating String:
##[debug]....=> 'package-lock.json'
##[debug]Search root directory: '/runner/_work/redacted'
##[debug]Search pattern: 'package-lock.json'
##[debug]Starting process:
##[debug]  File name: '/runner/externals/node16/bin/node'
##[debug]  Arguments: '"/runner/bin/hashFiles"'
##[debug]  Working directory: '/runner/_work/redacted'
##[debug]  Require exit code zero: 'False'
##[debug]  Encoding web name:  ; code page: ''
##[debug]  Force kill process on cancellation: 'False'
##[debug]  Redirected STDIN: 'False'
##[debug]  Persist current code page: 'False'
##[debug]  Keep redirected STDIN open: 'False'
##[debug]  High priority process: 'False'
##[debug]Failed to update oom_score_adj for PID: 438.
##[debug]System.UnauthorizedAccessException: Access to the path '/proc/438/oom_score_adj' is denied.
##[debug] ---> System.IO.IOException: Permission denied
##[debug]   --- End of inner exception stack trace ---
##[debug]   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
##[debug]   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
##[debug]   at System.IO.Strategies.BufferedFileStreamStrategy.Dispose(Boolean disposing)
##[debug]   at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
##[debug]   at System.IO.StreamWriter.Dispose(Boolean disposing)
##[debug]   at System.IO.File.WriteAllText(String path, String contents)
##[debug]   at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)
##[debug]Process started with process id 438, waiting for process exit.
##[debug]Match Pattern: package-lock.json
##[debug]::debug::followSymbolicLinks 'false'
##[debug]::debug::followSymbolicLinks 'false'
##[debug]::debug::implicitDescendants 'true'
##[debug]::debug::matchDirectories 'true'
##[debug]::debug::omitBrokenSymbolicLinks 'true'
##[debug]::debug::Search path '/runner/_work/redacted/package-lock.json'
##[debug]/runner/_work/redacted/package-lock.json
##[debug]Found 1 files to hash.
##[debug]Hash result: '0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab'
##[debug]undefined
##[debug]STDOUT/STDERR stream read finished.
##[debug]STDOUT/STDERR stream read finished.
##[debug]Finished process 438 with exit code 0, and elapsed time 00:00:00.0958795.
##[debug]..=> '0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab'
##[debug]=> 'Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab'
##[debug]Result: 'Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab'
##[debug]Loading env
Run actions/cache@v3
::save-state name=CACHE_KEY::Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab
##[debug]Save intra-action state CACHE_KEY = Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab
##[debug]Resolved Keys:
##[debug]["Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab"]
##[debug]Checking zstd --version
##[debug]*** zstd command line interface 64-bits v1.4.4, by Yann Collet ***
##[debug]Resource Url: https://redacted/_services/artifactcache/u5PbbHtYzwAz45gNhgoiFghxXlhxAwajVeuO8aG8DGVdUa9qnz/_apis/artifactcache/cache?keys=Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab&version=7fcda33c1e1d849a13bcc06f49b9ab64efc01ca9dabe4d7a8d0d387feef4fc88
::add-mask::***
##[debug]Cache Result:
##[debug]{"scope":"refs/pull/121/merge","cacheKey":"Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab","cacheVersion":"7fcda33c1e1d849a13bcc06f49b9ab64efc01ca9dabe4d7a8d0d387feef4fc88","creationTime":"2023-10-03T18:44:12.27Z","archiveLocation":"***"}
##[debug]Archive Path: /runner/_work/_temp/615f9f98-f993-4e16-b07e-aa12d0538a47/cache.tzst
##[debug]Use Azure SDK: true
##[debug]Download concurrency: 8
##[debug]Request timeout (ms): 30000
##[debug]Cache segment download timeout mins env var: undefined
##[debug]Segment download timeout (ms): 3600000
##[debug]Unable to validate download, no Content-Length header
/usr/bin/tar --use-compress-program unzstd -tf /runner/_work/_temp/615f9f98-f993-4e16-b07e-aa12d0538a47/cache.tzst -P
zstd: /*stdin*\: unsupported format 
/usr/bin/tar: Child returned status 1
/usr/bin/tar: Error is not recoverable: exiting now
Warning: Failed to restore: Tar failed with error: The process '/usr/bin/tar' failed with exit code 2
Cache not found for input keys: Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab
##[debug]Node Action run completed with exit code 0
##[debug]Finishing: Cache Main Dependencies
aparna-ravindra commented 9 months ago

@l0gicgate , The logs show that there was a matching cache :

[debug]Cache Result:

[debug]{"scope":"refs/pull/121/merge","cacheKey":"Linux-node-0246ff44a853363d5e7e78c781dd79c9ade76c372840fddd0b81324acc28a3ab","cacheVersion":"7fcda33c1e1d849a13bcc06f49b9ab64efc01ca9dabe4d7a8d0d387feef4fc88","creationTime":"2023-10-03T18:44:12.27Z","archiveLocation":"***"}

The next step is to download the contents of the cache from the archiveLocation. That has failed in the run. It is not easy to guess the cause of failure with the logs shared here. Most commonly, your blob storage would have failed to return the cached file. It might help to check your blob storage configuration and access as a next step.

elliottpope commented 9 months ago

Update on the root cause of this issue:

TL;DR GitHub Actions is configured with AWS S3 storage with Server Side Encryption via KMS (SSE-KMS). GitHub Enterprise Server uses presigned URLs to download the cache. SSE-KMS requires using AWS Signature Version 4 to download the content from S3 that GitHub Enterprise Server 3.7.1 does not support

Potential Solutions

  1. Upgrading GHES might fix the issue if they have implemented AWS Signature V4 for presigned URL downloads
  2. Downgrading to SSE-S3 for the Actions S3 bucket

Long Version It is difficult to find the actual download URL because actions/cache sets it as a secret that Actions will mask and the metadata URL uses the /_services endpoint which I could not find any documentation on how to query (I just get a 403 Forbidden error when I try to hit it from the administrative shell)

I did however find some weird behavior after logging in via the administrative shell.

I ran ghe-actions-check -s blob and got the following:

Storage tests finished, output is below:
============================================================
LR actions> Creating new log file /LR/Logs/Actions_OnPrem_Test-StorageConnection_***.log
LR actions> Test-Storage Connection utility for GitHub Enterprise
LR actions> Effective remote blob provider with [s3].
LR actions> Configured Blob provider is : [Microsoft.VisualStudio.Services.Cloud.AmazonS3BlobProvider]
LR actions> 1. Testing Upload content : Passed
LR actions> 
LR actions> 2. Testing Download content : Passed
LR actions> 
LR actions> 3. Testing Direct Download content : Failed
LR actions> 
LR actions> 
LR actions> 
LR actions> 4. Testing Delete content : Passed
LR actions> 
LR actions> 5. Testing MultiPart Upload content : Passed
LR actions> 
LR actions> 6. Testing Delete large content : Passed
LR actions> 
LR actions> 
LR actions> Warning! DirectDownload (Download object using presigned URL) Test failed. This failure will not affect configuring Actions. Some features (like Actions cache) might not work as expected.
LR actions> 
LR actions> 
LR actions> Storage tests passed with warnings.
LR actions> 
============================================================
Blob Storage is healthy!

After some diving into what this is doing, I did ghe-actions-console -s actions -c 'Test-StorageConnection -TreatWarningAsErrors -Debug' and got:

Creating new log file /LR/Logs/Actions_OnPrem_Test-StorageConnection_***.log
Test-Storage Connection utility for GitHub Enterprise
DEBUG: Loading blob provider configuration from file
Effective remote blob provider with [s3].
DEBUG: Blob provider [s3] with container prefix [***] is configured on this instance.
Configured Blob provider is : [Microsoft.VisualStudio.Services.Cloud.AmazonS3BlobProvider]
...
DEBUG: Acquiring directdownload URL for file [***] from container [***]
DEBUG: Received direct download uri : https://***/actions-***?AWSAccessKeyId=***&Expires=***&Signature=*** for [***] in container ***. Downloading content..
DEBUG: Pre-signed URL Response code : BadRequest, Content: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>***</RequestId><HostId>***</HostId></Error> for [***] in container ***
...
Test-StorageConnection: 1. Testing Upload content : Passed
2. Testing Download content : Passed
3. Testing Direct Download content : Failed
4. Testing Delete content : Passed
5. Testing MultiPart Upload content : Passed
6. Testing Delete large content : Passed
Storage tests failed, please see output for more details.

Putting the warning message together with the test failure points reasonably clearly to the root cause being S3 SSE-KMS and GitHub Enterprise not using AWS Signature V4. Either updating GHES or using SSE-S3 might resolve the issue but I haven't had a chance to test this. Will update this thread once I have tested these options

elliottpope commented 8 months ago

Confirmed that downgrading to AWS S3-SSE for the Actions S3 bucket fixes the issue Upgrading to GitHub Enterprise Server 3.7.17 did not resolve the issue

l0gicgate commented 7 months ago

Closing as resolved. See @elliottpope's comment for anyone else experiencing this issue.

Merinorus commented 7 months ago

Hello, We also encountered this issue (GHES 3.8, using SSE-S3): it was caused by not giving enough S3 authorizations to the GHES instance (various read and write permissions). Just for testing: create a test S3 bucket, give all the permissions to the GHES instance, and configure the GHES instance to use this bucket in the "Actions" settings. It solved the issue for us. Unfortunately, I don't know yet which exact permissions are needed for the instance to work (we gave more permissions than needed). It took us months (!) to find the root cause. I guess various origins can cause this issue: this log message doesn't offer any help in finding a solution, it ought to be more detailed.

Edit: Adding https://github.com/actions/cache/issues/1192 and https://github.com/actions/cache/issues/1052 for reference.

danwkennedy commented 3 weeks ago

A little late but in case others see this, you can debug this with the following check:

ghe-actions-check -s blob

If the Testing Direct Download content check fails with the following error on the download, then it's a signature version mismatch:

Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.

Actions has a feature flag that will force v4 that can be flipped on the Actions and ArtifactCache services running on GHES:

ghe-actions-console -s artifactcache --command 'Set-FeatureFlag -FeatureName Microsoft.VisualStudio.Services.Cloud.AmazonS3.ForceSigVersion4 -State On'
ghe-actions-console -s actions --command 'Set-FeatureFlag -FeatureName Microsoft.VisualStudio.Services.Cloud.AmazonS3.ForceSigVersion4 -State On'