cypress-io / github-action

GitHub Action for running Cypress end-to-end & component tests
https://on.cypress.io/guides/continuous-integration/github-actions
MIT License
1.35k stars 357 forks source link

Sporadic cypress binary cache failure of example-basic-pnpm workflow on Ubuntu #1179

Closed MikeMcC399 closed 3 months ago

MikeMcC399 commented 3 months ago

Issue

The workflow .github/workflows/example-basic-pnpm.yml fails sporadically. See the following action logs:

When the workflow fails, the log section "Cypress tests" shows:

/usr/local/bin/npx cypress cache list
No cached binary versions were found.
/usr/local/bin/npx cypress verify
The cypress npm package is installed, but the Cypress binary is missing.

and the failures are on one of the jobs:

Failure date ubuntu-20 ubuntu-22
May 6, 2024 (1649) - job 24636015853
Apr 24, 2024 (1645) job 24204489584 -
Apr 19, 2024 (1641) job 24018300090 -

It is not possible to analyze earlier failures, since the corresponding logs have been automatically purged by GitHub settings.

MikeMcC399 commented 3 months ago
MikeMcC399 commented 3 months ago

The cache race condition which led to this issue is resolved in https://github.com/cypress-io/github-action/pull/1180. If however the cache of the Cypress binary cache and the cache of the pnpm store cache are out of sync, the workflow does not recover automatically. It is necessary in this situation to manually delete the cache of the pnpm store cache and re-run the workflow.

I'll leave this issue open for the moment and monitor if there are additional failures.

MikeMcC399 commented 3 months ago

This issue is still happening.

See https://github.com/cypress-io/github-action/actions/runs/9186993814/job/25263739457

MikeMcC399 commented 3 months ago

Using the failed workflow https://github.com/cypress-io/github-action/actions/runs/9186993814/job/25263739457 as an example ...

The problem occurs when the pnpm store is cached in Linux-pnpm-store-* and the pnpm store contains a copy of the correct version of Cypress, whereas at the same time the cypress-linux-x64-* is missing the corresponding cache Cypress binary.

There is no simple solution to coordinating these two caches. Manual caching of the pnpm store cache in the example workflows uses the GitHub JavaScript action actions/cache. The Cypress GitHub action uses the npm module @actions/cache to cache the Cypress binary cache.

When there are parallel jobs running there is no guarantee that a cache can be saved if the jobs are using the same cache key. One job may block the other and lead to inconsistency.

Package managers, including pnpm, only run the postinstall script when Cypress is not installed. Cypress has no automatic fallback to attempt to (re-)install the Cypress binary if it is missing. If pnpm thinks that Cypress is installed, because it was found in the pnpm store cache, then it doesn't attempt to install the Cypress binary. When it comes time to verify Cypress, the verification fails because the Cypress binary cache wasn't populated.

The short-term mitigation is to remove the Linux store caching from the documentation and example workflows. The longer-term solution would be to implement https://github.com/cypress-io/github-action/issues/1044 to cache pnpm dependencies directly under the control of the Cypress GitHub action.

MikeMcC399 commented 3 months ago