flaky tests in GitHub actions

anchore / scan-action

Anchore container analysis and scan provided as a GitHub Action

MIT License

202 stars 75 forks source link

flaky tests in GitHub actions #265

Closed willmurphyscode closed 2 months ago

willmurphyscode commented 8 months ago

The unit tests for this repo sometimes fail with an error like this:

spawn ETXTBSY

      at ToolRunner.<anonymous> (node_modules/@actions/exec/src/toolrunner.ts:443:24)
      at node_modules/@actions/exec/lib/toolrunner.js:27:71
      at Object.<anonymous>.__awaiter (node_modules/@actions/exec/lib/toolrunner.js:23:12)
      at node_modules/@actions/exec/src/toolrunner.ts:419:58
      at ToolRunner.<anonymous> (node_modules/@actions/exec/src/toolrunner.ts:419:12)
      at fulfilled (node_modules/@actions/exec/lib/toolrunner.js:24:58)

(link)

I believe this is because the tests run simultaneously, but runGrype is not threadsafe.

https://github.com/anchore/scan-action/blob/52d017bdbe923afa39369bc0cb1c89ff7463ab54/index.js#L31-L35

This has a race condition, since whatever is present in the cache may be changed by one test while another test is checking it. I've also seen ENOENT in test runs.

willmurphyscode commented 8 months ago

We might be able to get away with just running one test at a time as a cheap way to fix this:

With maxConcurrency = 1, we see npm run test 11.77s user 6.00s system 33% cpu 53.132 total. But nearly 50 seconds of that is downloading the db (which only happens once regardless of test parallelism).

I'll see if maxConcurrency = 1 fixes this.

popey commented 2 months ago

Forgive the possibly ill-informed comment here. Would it be possible to pre-cache the grype-db in an early step before we kick off the further steps that may depend on it?