actions / runner

The Runner for GitHub Actions :rocket:
https://github.com/features/actions
MIT License
4.79k stars 939 forks source link

Post cache failed with "The template is not valid. hashFiles('**/Makefile') failed." on MacOs #449

Open Yanjingzhu opened 4 years ago

Yanjingzhu commented 4 years ago

There is the ticket in community forum. https://github.community/t5/GitHub-Actions/Mac-OS-build-are-random-cancelled-no-I-do-not-have-fail-fast/td-p/54494 Customer reports that the post cache action on macos was failed randomly result in job cancellation. There is an example workflow run which met this issue: https://github.com/ankitects/anki/actions/runs/81024319 Restart the build process works fine. Is there anything wrong in cache action?

joshmgross commented 4 years ago

Transferring to https://github.com/actions/runner since hashFiles is a function handled by the Runner and not exclusive to the cache action.

TingluoHuang commented 4 years ago

I think this is the side effect of the runner gets killed on the host machine which should never happen, I am following up with actions-compute team for this. The error means the runner didn't get the correct result when executing node hashFiles.js, node exit with nonzero exit code. We do print the stderr/stdout to debug, but I think I will include those outputs in the exception message when this happens.

TingluoHuang commented 4 years ago

https://github.com/github/c2c-actions-compute/issues/713

evandrocoan commented 4 years ago

@Yanjingzhu thanks for reporting it here!

Up to today, I am still having this error popping randomly (only on Mac OS machines), while Linux/Windows machines are working fine.

image

These are the contents of line 227:

File: anki/.github/workflows/checks.yml
222:       - name: Cache cargo rspy
223:         if: matrix.python == '3.7'
224:         uses: actions/cache@v1
225:         with:
226:           path: ${{ github.workspace }}${{ matrix.SEP }}rspy${{ matrix.SEP }}target
227:           key: ${{ runner.os }}-cargo-rspy-${{ hashFiles('**/requirements.*') }}-${{ hashFiles('**/setup.py') }}-${{ hashFiles('**/Makefile') }}-${{ hashFiles('**/Cargo.toml') }}-${{ matrix.BUILD_TYPE }}-16-
228: 
229:       - name: Set up curl pyaudio, rsync
230:         if: matrix.os == 'windows-latest'

Yesterday I had a Mac OS build failing without any reason for this no valid error: https://github.com/evandroforks/anki/runs/687614069#step:51:1

##[error].github/workflows/checks.yml (Line: 227, Col: 16):
##[error]The template is not valid. .github/workflows/checks.yml (Line: 227, Col: 16): hashFiles('**/Cargo.toml') failed. Fail to hash files under directory '/Users/runner/runners/2.262.1/work/anki/anki'

No way my template was invalid. All other builds (linux/windows) passed fine.

This was not the first time a Mac OS build failed out of nowhere with that error.

TingluoHuang commented 4 years ago

You might need to turn on debug logging to get more information out. 😢 https://help.github.com/en/actions/configuring-and-managing-workflows/managing-a-workflow-run#enabling-step-debug-logging

ericsciple commented 4 years ago

i wonder if we should add an option so post doesnt recompute inputs

evandrocoan commented 4 years ago

Instead of adding an option, just do not recompute inputs at all in post.

ericsciple commented 4 years ago

I think it depends on the action. An option would enable the action author to decide - the workflow author shouldn't have to think about it

moerishabh commented 2 years ago

Facing this issue again today on ubuntu-18.04. Following is the error

 hashFiles('../../**/package-lock.json') failed. Fail to hash files under directory

This has been working for the past 2 years. However starting today morning we are facing this issue

   uses: actions/cache@v1
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-cache-${{ hashFiles('../../**/package-lock.json')}}
          restore-keys: |
            ${{ runner.os }}-node-cache-

Screenshot 2022-03-03 at 1 52 55 PM

24 hours back it was working as expected

Screenshot 2022-03-03 at 2 39 34 PM

aidanmcquay commented 2 years ago

I am also experiencing this issue suddenly starting this morning.

image

TingluoHuang commented 2 years ago

@aidanmcquay We made a fix to the runner with https://github.com/actions/runner/pull/1678. Essentially hashFiles was previously failing silently but with the new runner version, it will raise an Error instead. 😢

You might want to enable debug log to see why it's erroring and fix the root cause.

https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging

More info: https://github.com/actions/cache/issues/753#issuecomment-1058324253

AllanOricil commented 2 years ago

Maybe things broke again? This workflow was working with no problem 1h ago. After the release of ubuntu 2.291.1 20min ago it stopped working image

JessikaCastellano commented 2 years ago

Same to me in the last hour all my actions are failing with this error.

2022-04-29_12-11

danielhrobidoux commented 2 years ago

@JessikaCastellano I am experiencing the same issue. I am wondering if it's also related to https://github.community/t/worflow-with-dispatch-and-branches-setting-suddenly-not-supported-anymore/247732

It seems there was a release, that broke workflow dispatch. If this action also uses it, that might be the issue.

JessikaCastellano commented 2 years ago

@JessikaCastellano I am experiencing the same issue. I am wondering if it's also related to https://github.community/t/worflow-with-dispatch-and-branches-setting-suddenly-not-supported-anymore/247732

It seems there was a release, that broke workflow dispatch. If this action also uses it, that might be the issue.

Thanks for the link, yes I think it could be related. In my case, my .yml doesn't have workflow_dispatch attribute :(

danielhrobidoux commented 2 years ago

@JessikaCastellano I am experiencing the same issue. I am wondering if it's also related to github.community/t/worflow-with-dispatch-and-branches-setting-suddenly-not-supported-anymore/247732 It seems there was a release, that broke workflow dispatch. If this action also uses it, that might be the issue.

Thanks for the link, yes I think it could be related. In my case, my .yml doesn't have workflow_dispatch attribute :(

yeah my thinking is the cache job we seem to both be using might be leveraging it. I don't know for certain though

thboop commented 2 years ago

This should be fixed now, we turned off the feature flag causing this issue as it wasn't quite working as we expected. Thank you for being patient as we figured this out. All new runs queued from now on should work as expected. Please let me know if you see any more issues.

cb-shivamagarwal commented 2 years ago

It seems things broke again. We are facing this issue on our self hosted runners. @thboop

josh803316 commented 2 years ago

We ran into this issue this morning as well on a fresh cold start of our self hosted (aws) runners. @thboop

ufechner7 commented 2 years ago

Same problem here: https://github.com/aenarete/AtmosphericModels.jl/runs/6395091381?check_suite_focus=true

artazar commented 2 years ago

Same error

Error: ***/.github/workflows/build_android_publish.yml@main (Line: 138, Col: 14):
Error: The template is not valid. ***.github/workflows/build_android_publish.yml@main (Line: 138, Col: 14): hashFiles('**/*.gradle, **/*.gradle.kts') failed. Fail to hash files under directory '/home/runner/work/***/***'
thboop commented 2 years ago

Could you please enable debug logging for steps and try again. There are a few different reasons this could occur, and I don't think this is a repeat of the previous issue.

artazar commented 2 years ago

Here's the debug log from my end:

[2022-05-18 03:18:45Z INFO StepsRunner] Processing step: DisplayName='Post Cache Gradle dependencies'
[2022-05-18 03:18:45Z INFO StepsRunner] Evaluating: success()
[2022-05-18 03:18:45Z INFO StepsRunner] Result: true
[2022-05-18 03:18:45Z INFO StepsRunner] Starting the step.
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Bin': '/home/runner/runners/2.291.1/bin'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Root': '/home/runner/runners/2.291.1'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Work': '/home/runner/work'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Actions': '/home/runner/work/_actions'
[2022-05-18 03:18:45Z INFO ActionManager] Load action that reference repository from '/home/runner/work/_actions/actions/cache/v2'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Bin': '/home/runner/runners/2.291.1/bin'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Root': '/home/runner/runners/2.291.1'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Work': '/home/runner/work'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Actions': '/home/runner/work/_actions'
[2022-05-18 03:18:45Z INFO ActionManifestManager] Ignore action property author.
[2022-05-18 03:18:45Z INFO ActionManifestManager] Ignore action property branding.
[2022-05-18 03:18:45Z INFO ActionManifestManager] Loaded action.yml file: {
  "name": "Cache",
  "description": "Cache artifacts like dependencies and build outputs to improve workflow execution time",
  "inputs": {
    "type": 2,
    "map": [
      {
        "key": {
          "type": 0,
          "file": 6,
          "line": 5,
          "col": 3,
          "lit": "path"
        },
        "value": ""
      },
      {
        "key": {
          "type": 0,
          "file": 6,
          "line": 8,
          "col": 3,
          "lit": "key"
        },
        "value": ""
      },
      {
        "key": {
          "type": 0,
          "file": 6,
          "line": 11,
          "col": 3,
          "lit": "restore-keys"
        },
        "value": ""
      },
      {
        "key": {
          "type": 0,
          "file": 6,
          "line": 14,
          "col": 3,
          "lit": "upload-chunk-size"
        },
        "value": ""
      }
    ]
  },
  "execution": {
    "executionType": "nodeJS",
    "hasPre": false,
    "hasPost": true,
    "script": "dist/restore/index.js",
    "pre": null,
    "post": "dist/save/index.js",
    "nodeVersion": "node12",
    "cleanupCondition": "success()",
    "initCondition": "always()"
  },
  "deprecated": null
}
[2022-05-18 03:18:45Z INFO ActionManager] Action pre node.js file: N/A.
[2022-05-18 03:18:45Z INFO ActionManager] Action node.js file: dist/restore/index.js.
[2022-05-18 03:18:45Z INFO ActionManager] Action post node.js file: dist/save/index.js.
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Bin': '/home/runner/runners/2.291.1/bin'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Root': '/home/runner/runners/2.291.1'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Work': '/home/runner/work'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Temp': '/home/runner/work/_temp'
[2022-05-18 03:18:45Z INFO ExecutionContext] Write event payload to /home/runner/work/_temp/_github_workflow/event.json
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Bin': '/home/runner/runners/2.291.1/bin'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Root': '/home/runner/runners/2.291.1'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Work': '/home/runner/work'
[2022-05-18 03:18:45Z INFO HostContext] Well known directory 'Temp': '/home/runner/work/_temp'
[2022-05-18 03:18:45Z INFO ExtensionManager] Getting extensions for interface: 'GitHub.Runner.Worker.IFileCommandExtension'
[2022-05-18 03:18:45Z INFO JobServerQueue] Try to append 1 batches web console lines for record '212fb7c0-4d03-4646-8c6b-c0883f45f0eb', success rate: 1/1.
[2022-05-18 03:18:45Z ERR  StepsRunner] Caught exception from step: GitHub.DistributedTask.ObjectTemplating.TemplateValidationException: The template is not valid. ***/.github/workflows/build_android_publish.yml@feature/android_send_apk (Line: 138, Col: 14): hashFiles('**/*.gradle, **/*.gradle.kts') failed. Fail to hash files under directory '/home/runner/work/***’
   at GitHub.DistributedTask.ObjectTemplating.TemplateValidationErrors.Check()
   at GitHub.DistributedTask.Pipelines.ObjectTemplating.PipelineTemplateEvaluator.EvaluateStepInputs(TemplateToken token, DictionaryContextData contextData, IList`1 expressionFunctions)
   at GitHub.Runner.Worker.ActionRunner.RunAsync()
   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
[2022-05-18 03:18:45Z INFO StepsRunner] Step result: Failed
[2022-05-18 03:18:45Z INFO StepsRunner] Update job result with current step result 'Failed'.
[2022-05-18 03:18:45Z INFO StepsRunner] Current state: job state = 'Failed'
ForNeVeR commented 2 years ago

We are experiencing this consistently 100% of the time in this job.

Some details:

For now, I'm going to disable the correspoding caches causing this issue. Which is far from ideal workaround, but whatever.

MikulasMascautanu commented 2 years ago

We're also getting this error in our pipelines when using actions/cache. This did not use to happen in the past, started happening only very recently, last week I would say. I tried using cache@v2 as well as cache@v3 with no luck. We for example have a workflow with 5 jobs which get triggered at the same time and are run by the same reusable workflow with the exception of having different arguments passed into that reusable workflow (timeout and name, nothing important imo). If the error happens, it always happens for the last job to finish out of these 5 almost-identical jobs. image

2022-05-31T12:14:22.4593307Z ##[error]/workflows/run-tests-reusable.yaml@0c39ee2b9e592232ae97750716ce8be6d2d57515 (Line: 71, Col: 16):
2022-05-31T12:14:22.4916533Z ##[error]The template is not valid. /workflows/run-tests-reusable.yaml@0c39ee2b9e592232ae97750716ce8be6d2d57515 (Line: 71, Col: 16): hashFiles('**/package-lock.json') couldn't finish within 120 seconds.

Workaround is to re-run the failed job. It sometimes helps on the first retry, sometimes on the second or third retry.

tevio commented 2 years ago

We're also getting this error in our pipelines when using actions/cache. This did not use to happen in the past, started happening only very recently, last week I would say. I tried using cache@v2 as well as cache@v3 with no luck. We for example have a workflow with 5 jobs which get triggered at the same time and are run by the same reusable workflow with the exception of having different arguments passed into that reusable workflow (timeout and name, nothing important imo). If the error happens, it always happens for the last job to finish out of these 5 almost-identical jobs. image

2022-05-31T12:14:22.4593307Z ##[error]/workflows/run-tests-reusable.yaml@0c39ee2b9e592232ae97750716ce8be6d2d57515 (Line: 71, Col: 16):
2022-05-31T12:14:22.4916533Z ##[error]The template is not valid. /workflows/run-tests-reusable.yaml@0c39ee2b9e592232ae97750716ce8be6d2d57515 (Line: 71, Col: 16): hashFiles('**/package-lock.json') couldn't finish within 120 seconds.

Workaround is to re-run the failed job. It sometimes helps on the first retry, sometimes on the second or third retry.

+1 - always the last job in a matrix of containers for us

bkdotcom commented 2 years ago

I was experiencing this issue finally I noticed this line with debug enabled

[debug][Error: EACCES: permission denied, scandir '/home/runner/work/projectName/projectName/tests/someDir

nutshell, my unit tests changed a directory's permissions to test an error writing to a file... which ended up also causing the action to fail

Slamdunk commented 2 years ago

@bkdotcom we experienced the same issue and same solution too.

hashFiles('**/composer.json') dives into all directories of our code, but during a specific Mutation Testing run one folder is set to have 0x000 permissions without resetting it afterwards: the result is that hashFiles can't scan that folder anymore and the post cache fails.

Re-adding rX permissions to that folder solved the issue :+1:

supalarry commented 2 years ago

Same here, using actions/cache@v3 to hashFiles('**/pnpm-lock.yaml') on ubuntu-latest and getting:

Error: The template is not valid. .github/workflows/some-worker.yaml (Line: 23, Col: 16): hashFiles('**/pnpm-lock.yaml') failed. Fail to hash files under directory '/home/runner/work/some-org/some-repo'

Tried re-running the job with no luck.

fullbl commented 2 years ago

same problem for me:

Error: The template is not valid. .github/workflows/tests.yml (Line: 90, Col: 14): hashFiles('**/composer.lock') failed. Fail to hash files under directory '/home/runner/work/k2/k2'

it works in another job on the same repository

lithiumtoast commented 2 years ago

Came across this myself today. Dear reader your problem may be unique to the file system and the files you are trying to hash. To investigate enable debug logging for the workflow jobs/steps.

An easy way I found to enable debug logging is to re-run the job(s) and tick this checkbox.

Screen Shot 2022-08-14 at 11 37 24
pymumu commented 2 years ago

same here. post cache will fail when runs on custom container runs-on ubuntu-lastes is ok.

action file:

name: Merge Request CUDA

on:
  pull_request:
    branches: 
      - main

env:
  BUILD_TYPE: Release

jobs:
  build:
    runs-on: ubuntu-latest
    container:
      image: modelbox/modelbox-develop-tensorflow_2.6.0-cuda_11.2-ubuntu-x86_64

    steps:
    - uses: actions/checkout@v3
    # - run: apt update 
    - name: Set up JDK 11
      uses: actions/setup-java@v1
      with:
        java-version: 11
        maven-version: '3.6.2'
        cache: 'maven'
    - name: Setup Maven
      uses: stCarolas/setup-maven@v4.4
      with:
        maven-version: 3.8.2
    - uses: actions/cache@v1
      with:
        path: ~/.m2/repository
        key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
        restore-keys: |
          ${{ runner.os }}-maven-
    - name: ccache
      uses: hendrikmuhs/ccache-action@v1.2
      with:
        key: ubuntu-latest
        max-size: 1024M
    - name: Configure CMake
      run: |
        mkdir build
        cd build
        cmake .. -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DWITH_WEBUI=off -DCLANG_TIDY=on -DCLANG_TIDY_AS_ERROR=on -DWITH_JAVA=on

    - name: Build
      working-directory: build
      run: |
        make modelbox-java

related job: https://github.com/modelbox-ai/modelbox/actions/runs/2918778608

error message:

build
.github/workflows/merge-request-cuda.yml (Line: 33, Col: 14):
build
The template is not valid. .github/workflows/merge-request-cuda.yml (Line: 33, Col: 14): hashFiles('**/pom.xml') failed. Fail to hash files under directory '/home/runner/work/modelbox/modelbox'

change hashFiles('**/pom.xml') to hashFiles('path/to/pom.xml') is OK.

austinpray-mixpanel commented 2 years ago

On a self-hosted runner running with container: specified it fails with

##[debug]..Evaluating hashFiles:
##[debug]....Evaluating String:
##[debug]....=> '**/*.go'
##[debug]Search root directory: '/actions-runner/_work/analytics/analytics'
##[debug]Search pattern: '**/*.go'
##[debug]Starting process:
##[debug]  File name: '/actions-runner/externals/node16/bin/node'
##[debug]  Arguments: '"/actions-runner/bin/hashFiles"'
##[debug]  Working directory: '/actions-runner/_work/analytics/analytics'
##[debug]  Require exit code zero: 'False'
##[debug]  Encoding web name:  ; code page: ''
##[debug]  Force kill process on cancellation: 'False'
##[debug]  Redirected STDIN: 'False'
##[debug]  Persist current code page: 'False'
##[debug]  Keep redirected STDIN open: 'False'
##[debug]  High priority process: 'False'
##[debug]Failed to update oom_score_adj for PID: 45[41](https://github.com/<snip>/runs/8029033457?check_suite_focus=true#step:21:41)6.
AleBorini commented 2 years ago

Hello guys we are facing a similar issue on our workflows. I have the Post Cache Node Modules job failing exclusively on MacOS runners with the following error:

Error: The template is not valid. .github/workflows/e2e.yml (Line: 34, Col: 16): hashFiles('**/yarn.lock') couldn't finish within 120 seconds.,.github/workflows/e2e.yml (Line: 35, Col: 25): hashFiles('**/yarn.lock') couldn't finish within 120 seconds. 

I know the workflow is correct since it's working correctly when running on Ubuntu, but it keeps failing when I run tests on MacOs.

The only line I did change is runs-on: macos-latest.

This is the generate cache step we are currently using that works on Ubuntu but no MacOS:

        uses: actions/cache@v3
        with:
          path: '**/node_modules'
          key: ${{ runner.os }}-cache-node-${{ hashFiles('**/yarn.lock') }}
          restore-keys: |
            ${{ runner.os }}-cache-node-${{ hashFiles('**/yarn.lock') }}
            ${{ runner.os }}-cache-node-
            ${{ runner.os }}-cache-

Worth to mention that I ran the workflows in debug mode and nothing really useful popped out. It just appeared that it cant find the file in any place.

Any pro tip on how to solve this issue?

Thanks in advance

monaka commented 2 years ago

I got a similar issue on ubuntu-latest and found a workaround.

- hashFiles('**/Cargo.toml')
+ hashFiles('Cargo.toml', '*/Cargo.toml')

I guess hashFiles can't find requested files if huge files exist in the working directory.

spangaer commented 2 years ago

Turned out to be a golden tip. So I just cleaned up some build trash, post build, and it works again.

AlekSi commented 1 year ago

In my case, hashFiles generated "The template is not valid" error, but the real issue was a permission problem. sudo rm -fr <dir> fixed it.

rehanqasimrh commented 1 year ago

Unable to resolve this:

image

fer-ri commented 9 months ago

I got a similar issue on ubuntu-latest and found a workaround.

- hashFiles('**/Cargo.toml')
+ hashFiles('Cargo.toml', '*/Cargo.toml')

I guess hashFiles can't find requested files if huge files exist in the working directory.

Works for me too

      - name: Cache Composer
        uses: actions/cache@v3
        with:
          path: ${{ steps.composer-cache.outputs.dir }}
          key: ${{ runner.os }}-composer-${{ hashFiles('composer.lock', '*/composer.lock') }}
          restore-keys: |
            ${{ runner.os }}-composer-
simonDos commented 7 months ago

I got a similar issue on ubuntu-latest and found a workaround.

- hashFiles('**/Cargo.toml')
+ hashFiles('Cargo.toml', '*/Cargo.toml')

In my case, hashFiles was failing because it searched every directory in the repo, but one of those directories it did not have permission to read! So if you know exactly where your i.e. package-lock.json file is, it makes sense to reference it more concretely like this. hashFiles('package-lock.json', '*/package-lock.json') 👍