gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.19k stars 10.33k forks source link

Gatsby build occasionally time out after "Caching JavaScript and CSS webpack compilation" #33557

Closed robinmetral closed 2 years ago

robinmetral commented 2 years ago

Preliminary Checks

Description

About a month ago, builds of my Gatsby site started failing in GitHub Actions.

For context, this Gatsby site is a minimal example using my gatsby-source-s3 plugin, that I use for testing the plugin end-to-end.

Here's from an example of a failed build:

# โ„น๏ธ omitted earlier logs, check out the link above to view the entire output
success run page queries - 13.660s - 1/1 0.07/s # ๐Ÿ‘ˆ this is after all nodes were sourced from S3, so I don't think the problem is with my plugin
success write out requires - 0.011s
success Building production JavaScript and CSS bundles - 9.577s
success Rewriting compilation hashes - 0.005s
success Writing page-data.json files to public directory - 0.017s - 1/1 57.50/s
success Caching JavaScript and CSS webpack compilation - 13.059s # ๐Ÿ‘ˆ this is where is hangs
Error: Timed out waiting for: http://localhost:9000 # ๐Ÿ‘ˆ this is the timeout from cypress. I've also increased it to see if it just took too long, but that was not the issue

At first, all builds were failing, and in the last week it seems that some of them are passing again while others are still failing.

I'm not sure what's happening here, as there are no helpful logs in the build output. A local gatsby develop works completely fine. The occasional passing build seem to rule out issues in my project or in other dependencies. I also asked on discord but it didn't seem like anyone was having the same issue.

Any thoughts on:

Thank you!

A note about the reproduction link ๐Ÿ‘‡ I am aware that the issue template says "Do not link to your actual project", but in my case this example site is exactly a minimal example, since the only thing is does is pulling images from S3. I added the GH Action output there, here's the link to the actual source although I doubt that it will be helpful.

Reproduction Link

https://github.com/robinmetral/gatsby-source-s3/runs/3912139392?check_suite_focus=true

Steps to Reproduce

  1. Visit the link above to see the failing build output. I cannot reliably reproduce this locally, as mentioned in the summary.

Expected Result

gatsby build should not time out

Actual Result

gatsby build is flaky and unreliable, and regularly hangs in CI envs

Environment

My local environment is not the issue here, this is failing on a clean GH Actions machine with:

Config Flags

No response

vyasdhairya commented 2 years ago

updates the husky config after v7 (ensures the commit hook works) updates the example site to work with the new gatsby image (address the Gatsby deprecation warning) follows-up on chore(ci): pin example site dependencies by committing the lockfile #527 by fixing how the linking of the local package is handled in CI

robinmetral commented 2 years ago

Sorry, I don't understand what you mean.

Yes these are commit messages in the project but I don't see the relevance ๐Ÿ˜„

robmarshall commented 2 years ago

I am also having this issue. I feel (although not sure) that these are caused by sharp plugin. gatsby-transformer-sharp & gatsby-plugin-sharp maybe.

edit: Reverting both of the above plugins to 3.12.0 meant the builds passed.

LekoArts commented 2 years ago

Hi, thanks for the issue!

Did you try out the @next versions of the plugins and Gatsby itself? Can you definitely pinpoint it to the sharp plugins? Can you bisect between which version it happened?

robinmetral commented 2 years ago

I feel (although not sure) that these are caused by sharp plugin. gatsby-transformer-sharp & gatsby-plugin-sharp maybe.

Pinning these dependencies to 3.12.0 does seem to fix it for me as well. I'll wait a couple days to be positive about this, since it was not always failing, just generally flaky.

Did you try out the @next versions of the plugins and Gatsby itself? Can you definitely pinpoint it to the sharp plugins? Can you bisect between which version it happened?

I haven't tried next yet, happy to do it. But unless there was a known issue that was solved in a prerelease, i don't see how it would be any different. I'll give it a spin anyways.

As for whether we can definitely pinpoint this to sharp, I'll hand this to @robmarshallโ€”I'm not sure how you figured this out (๐Ÿ’ฏ). But if it is indeed related to the gatsby monorepo version bumps, in my project I saw the failures start to happen on the day this PR was merged, i.e. the bump from 3.13 to 3.14.

robmarshall commented 2 years ago

i.e. the bump from 3.13 to 3.14.

I can confirm the issue is introduced by gatsby-plugin-sharp 3.14. Bumping it to next does not solve the problem.

cameronbraid commented 2 years ago

I hit the same issue.

With gatsby-plugin-sharp and gatsby-transformer-sharp at 3.14 I get : Caching JavaScript and CSS webpack compilation - 2158.531s

With gatsby-plugin-sharp and gatsby-transformer-sharp at 3.12 I get : Caching JavaScript and CSS webpack compilation - 40.313s

sirichards commented 2 years ago

I had the same issue, rolling back to version 3.13.0 for gatsby-plugin-sharp and gatsby-transformer-sharp fixed it for me. Although i did get caught out by the ^ symbol in my package.json, so I went from "^3.13.0" to "3.13.0"

sirichards commented 2 years ago

Upgrading to Gatsby v4 with related packages now causes my build to stop at "success run queries in workers - 7.453s - 65/65 8.72/s"

badaczewski commented 2 years ago

I'm getting the same issues as @sirichards. And previous to the Gatsby 4 upgrade I was getting the freeze at "Caching JavaScript and CSS webpack compilation".

To get the project to build again I had to revert back to Gatsby 3 packages and makes sure to use:

gatsby-plugin-sharp:3.14.1 gatsby-transformer-sharp:3.14.0

rahulsuresh-git commented 2 years ago

Can confirm, facing the same issue. Unable to use Gatsby v4 because of this :/

stevepepple commented 2 years ago

Same here, on both local system (see below) and Netlify:

System:
    OS: Linux 5.4 Debian GNU/Linux 10 (buster) 10 (buster)
    CPU: (16) x64 Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
    Shell: 5.0.3 - /bin/bash
  Binaries:
    Node: 14.17.6 - /usr/local/bin/node
    Yarn: 1.22.17 - /workspaces/vibemap.com/site/node_modules/.bin/yarn
    npm: 6.14.15 - /usr/local/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  npmPackages:
    gatsby: ^4.0.0 => 4.0.0
    gatsby-background-image: ^1.5.3 => 1.5.3
    gatsby-cli: ^4.0.0 => 4.0.0
    gatsby-gravityforms-component: ^3.1.0 => 3.1.0
    gatsby-image: ^3.11.0 => 3.11.0
    gatsby-link: ^3.14.0 => 3.14.0
    gatsby-plugin-build-date: ^1.0.0 => 1.0.0
    gatsby-plugin-bundle-stats: ^3.2.0 => 3.2.0
    gatsby-plugin-client-side-redirect: ^1.1.0 => 1.1.0
    gatsby-plugin-facebook-pixel: ^1.0.5 => 1.0.8
    gatsby-plugin-feed: ^4.0.0 => 4.0.0
    gatsby-plugin-gatsby-cloud: ^4.0.0 => 4.0.0
    gatsby-plugin-google-adsense: ^1.1.3 => 1.1.3
    gatsby-plugin-google-analytics: ^4.0.0 => 4.0.0
    gatsby-plugin-google-tagmanager-delayed: ^2.1.25 => 2.1.25
    gatsby-plugin-image: ^2.1.0-next.0 => 2.1.0-next.0
    gatsby-plugin-loadable-components-ssr: ^3.4.0 => 3.4.0
    gatsby-plugin-lodash: ^4.14.0 => 4.14.0
    gatsby-plugin-manifest: ^4.0.0 => 4.0.0
    gatsby-plugin-netlify: ^4.0.0-next.0 => 4.0.0-next.0
    gatsby-plugin-nprogress: ^4.0.0 => 4.0.0
    gatsby-plugin-percy: ^0.1.4 => 0.1.4
    gatsby-plugin-preconnect: ^1.2.1 => 1.2.1
    gatsby-plugin-purgecss: ^6.0.2 => 6.0.2
    gatsby-plugin-react-helmet: ^4.14.0 => 4.14.0
    gatsby-plugin-react-helmet-canonical-urls: ^1.4.0 => 1.4.0
    gatsby-plugin-react-i18next: ^1.1.1 => 1.1.1
    gatsby-plugin-react-svg: ^3.1.0 => 3.1.0
    gatsby-plugin-remove-fingerprints: ^0.0.2 => 0.0.2
    gatsby-plugin-resolve-src: ^2.1.0 => 2.1.0
    gatsby-plugin-robots-txt: ^1.6.13 => 1.6.13
    gatsby-plugin-sass: ^4.14.0 => 4.14.0
    gatsby-plugin-sass-resources: ^2.0.0 => 2.0.0
    gatsby-plugin-sharp: ^4.0.0 => 4.0.0
    gatsby-plugin-sitemap: ^4.10.0 => 4.10.0
    gatsby-plugin-split-css: ^2.0.0 => 2.0.0
    gatsby-plugin-styled-components: ^5.0.0 => 5.0.0
    gatsby-plugin-transition-link: ^1.20.5 => 1.20.5
    gatsby-plugin-use-query-params: ^1.0.1 => 1.0.1
    gatsby-plugin-web-font-loader: ^1.0.4 => 1.0.4
    gatsby-plugin-webpack-bundle-analyser-v2: ^1.1.25 => 1.1.25
    gatsby-source-filesystem: ^4.0.0 => 4.0.0
    gatsby-source-gravityforms: ^1.0.17 => 1.0.17
    gatsby-source-wordpress: ^6.0.0 => 6.0.0
pieh commented 2 years ago

Thanks to @robinmetral for help getting his starter to run I did manage to reproduce few times (on GitHub Actions, still not locally :( ).

Current working theory is that using single ReadStream for multiple pipeline is the root of the problem (not always, depending on timing conditions) and will be checking if utilising https://github.com/mcollina/cloneable-readable to create stream clones help

--edit For technical stuff - this is "patch" I did for debugging - https://github.com/pieh/gatsby-source-s3/blob/main/examples/gatsby-starter-source-s3/patches/gatsby-plugin-sharp%2B3.14.1.patch and https://github.com/pieh/gatsby-source-s3/runs/4003271418?check_suite_focus=true is the result. In particular in those run logs, you can somewhat see that 2 images that were predticted to hang indeed did hang.

rahulsuresh-git commented 2 years ago

Thanks for the fix! Is this live on v4.0.0 or would it come with the next release? @wardpeet

robinmetral commented 2 years ago

It can't be in 4.0.0 since it's already on npm, but judging by the labels on the PR the fix will be released as a patch for both v3 and v4

LekoArts commented 2 years ago

but judging by the labels on the PR the fix will be released as a patch for both v3 and v4

correct

witcradg commented 2 years ago

My gatsby-plugin-sharp version is 3.9.0 "gatsby-plugin-sharp": "^3.9.0",

Might this be a separate issue?

FWIW - this is far more than "occasionally" as stated in the issue title. It's occurring almost every time I run gatsby build. This is killing development of things that only occur in production mode of gatsby build (e.g. robots.txt creation). For now I've been able to test only by pushing to Netlify where it seems to still be working.

"dependencies": { "@fortawesome/fontawesome-svg-core": "^1.2.35", "@fortawesome/free-brands-svg-icons": "^5.15.3", "@fortawesome/free-regular-svg-icons": "^5.15.3", "@fortawesome/free-solid-svg-icons": "^5.15.3", "@fortawesome/react-fontawesome": "^0.1.14", "bootstrap": "^5.0.2", "gatsby": "^3.9.0", "gatsby-background-image": "^1.5.3", "gatsby-cli": "^3.14.0", "gatsby-plugin-breadcrumb": "^12.1.1", "gatsby-plugin-fontawesome-css": "^1.1.0", "gatsby-plugin-gatsby-cloud": "^2.9.0", "gatsby-plugin-google-tagmanager": "^3.9.0", "gatsby-plugin-image": "^1.9.0", "gatsby-plugin-lucky-orange": "^1.0.3", "gatsby-plugin-manifest": "^3.9.0", "gatsby-plugin-offline": "^4.9.0", "gatsby-plugin-preload-fonts": "^2.9.0", "gatsby-plugin-react-helmet": "^4.9.0", "gatsby-plugin-robots-txt": "^1.6.13", "gatsby-plugin-sharp": "^3.9.0", "gatsby-plugin-sitemap": "^5.0.0", "gatsby-source-filesystem": "^3.9.0", "gatsby-transformer-csv": "^3.11.0", "gatsby-transformer-remark": "^4.6.0", "gatsby-transformer-sharp": "^3.9.0", "gbimage-bridge": "^0.1.4", "prop-types": "^15.7.2", "react": "^17.0.2", "react-bootstrap": "^1.6.1", "react-collapsible-component": "^1.3.4", "react-dom": "^17.0.2", "react-elfsight-widget": "^1.0.0", "react-helmet": "^6.1.0", "react-popper": "^2.2.5", "react-slick": "^0.28.1", "react-transition-group": "^4.4.2", "reactstrap": "^8.9.0", "slick-carousel": "^1.8.1", "type-fest": "^0.20.2" },

rahulsuresh-git commented 2 years ago

I've noticed this issue happen clearly on machines with < 6 cores of CPU.

witcradg commented 2 years ago

@icy-meteor Ugh. Yep. I converted an older machine to Ubuntu. It only has 4 cores.

witcradg commented 2 years ago

I just dragged out an old laptop with 8 cores, pulled the recent changes. The build ran fine. This machine has 8 cores. This would seem to be a different issue.

wardpeet commented 2 years ago

I've published a fix in gatsby-plugin-sharp version 4.0.1. Version 3 will soon follow

stevepepple commented 2 years ago

Thanks for the patch! I upgraded both gatsby and gatsby-plugin-sharp, and now on Netlify, I see a different issue at this step:

success run queries in workers - 484.821s - 1795/1795 3.70/s
4:19:46 PM: error Received message about completed job that wasn't scheduled by this worker
4:19:46 PM: 
4:19:46 PM:   Error: Received message about completed job that wasn't scheduled by this work  er

4:19:46 PM:   - worker-messaging.ts:67 
4:19:46 PM:     [site]/[gatsby]/src/utils/jobs/worker-messaging.ts:67:17
4:19:46 PM:   
4:19:46 PM:   - child.js:80 process.messageHandler
4:19:46 PM:     [site]/[gatsby-worker]/dist/child.js:80:9
4:19:46 PM:   
4:19:46 PM:   - index.js:175 processEmit
4:19:46 PM:     [site]/[signal-exit]/index.js:175:34
4:19:46 PM:   
4:19:46 PM:   - sourcemap-register.js:926 process.emit
4:19:46 PM:     [site]/[@turist]/fetch/dist/sourcemap-register.js:926:21
4:19:46 PM:   
4:19:46 PM:   - child_process.js:912 emit
4:19:46 PM:     internal/child_process.js:912:12
4:19:46 PM:   
4:19:46 PM:   - task_queues.js:83 processTicksAndRejections
4:19:46 PM:     internal/process/task_queues.js:83:21
4:19:46 PM:   
4:19:46 PM: 
4:19:48 PM: error UNHANDLED REJECTION There is no worker with "2" id.
cameronbraid commented 2 years ago

And another issue which may be related.

Using gatsby 3.14.3 with sharp+transformer 3.11. I get a 20 minute delay between two steps Caching JavaScript and CSS webpack compilation and Building HTML renderer see timestamps in log here :

2021-10-28T02:19:59.738Z | success write out requires - 0.013s
2021-10-28T02:21:52.715Z | success Building production JavaScript and CSS bundles - 112.966s
2021-10-28T02:21:52.740Z | success Rewriting compilation hashes - 0.023s
2021-10-28T02:21:59.109Z | success Writing page-data.json files to public directory - 6.367s - 1362/1362 213.91/s
2021-10-28T02:22:09.601Z | success Caching JavaScript and CSS webpack compilation - 16.914s
2021-10-28T02:43:53.324Z | success Building HTML renderer - 37.830s
2021-10-28T02:44:02.954Z | success Caching HTML renderer compilation - 10.020s
2021-10-28T02:44:22.942Z | success Building static HTML for pages - 29.598s - 1362/1362 46.02/s
2021-10-28T02:44:27.721Z | success write out nginx configuration - 4.761s
rahulsuresh-git commented 2 years ago

Can confirm, the issue seems to be fixed now on Gatsby v4. Thanks ๐Ÿš€

pieh commented 2 years ago

Patch for v3 was just released (gatsby-plugin-sharp@3.14.2)

witcradg commented 2 years ago

Confirming this seems to be fixed on Gatsby 3. Thank you.

lezan commented 2 years ago

Hello, I am still experience the same issue of @sirichards with gatsby-plugin-sharp but also with 3.14.3: build freezed at Caching JavaScript and CSS webpack compilation. Then I updated to 4.1.2 and now I get the same issue of @sirichards: stucked after success run queries in workers - 160.247s - 111/111 0.69/s.

NickBarreto commented 2 years ago

+1, I am having the same issue as @lezan and @sirichards. Upgraded to Gatsby v4, and the first few builds had no issues.

However, today all of my builds are hanging at success run queries in workers, and then eventually timing out.

Log snipped below from the first time it happened, and has persisted all day:

10:01:13 AM: success Building production JavaScript and CSS bundles - 80.112s
10:01:45 AM: success Building HTML renderer - 32.379s
10:01:45 AM: success Execute page configs - 0.092s
10:01:45 AM: success Caching Webpack compilations - 0.001s
10:03:37 AM: success run queries in workers - 111.803s - 1691/1691 15.12/s
10:27:08 AM: Build exceeded maximum allowed runtime

Tried a few various things to get things unstuck, turning on parallel workers, DSG in some pages, but doesn't seem to be having any effect. The next step in the process, 'merge worker state', never seems to happen/finish.

lezan commented 2 years ago

+1, I am having the same issue as @lezan and @sirichards. Upgraded to Gatsby v4, and the first few builds had no issues.

However, today all of my builds are hanging at success run queries in workers, and then eventually timing out.

Log snipped below from the first time it happened, and has persisted all day:

10:01:13 AM: success Building production JavaScript and CSS bundles - 80.112s
10:01:45 AM: success Building HTML renderer - 32.379s
10:01:45 AM: success Execute page configs - 0.092s
10:01:45 AM: success Caching Webpack compilations - 0.001s
10:03:37 AM: success run queries in workers - 111.803s - 1691/1691 15.12/s
10:27:08 AM: Build exceeded maximum allowed runtime

Tried a few various things to get things unstuck, turning on parallel workers, DSG in some pages, but doesn't seem to be having any effect. The next step in the process, 'merge worker state', never seems to happen/finish.

Maybe I digged my issue to gatsby-plugin-sharp config. Using formats: ['auto', 'webp', 'avif'] with around 355 images (400MB) does not make finish my build process. With the default config forgatsby-plugin-sharp I can build it.

NickBarreto commented 2 years ago

Interesting! I just dig into my config files and I am set to formats: ['auto', 'webp'], which I believe is the default (at least, according to the docs.

I am, however, using placeholder: 'blurred', which may be the expensive operation. I'm going to try and switch it back to 'dominantColor' and see if that helps.

Something must be going on that has degraded the performance of sharp though, because my builds were not taking as long as all this before I started running into this error.

I've also managed to burn through so so many build minutes on Netlify trying to debug this as well, as I kept trying to see whether the adjustments I made were cutting back on the build time or not. So will be careful in testing this out, will try the builds locally and see if it takes less time before trying things out on Netlify.

Aside: DSG, incidentally, hasn't helped this because I guess Gatsby needs the images available in advance even if the page generation gets deferred, so those operations always happen on build anyway.

lezan commented 2 years ago

Hey @NickBarreto for my particular case placeholder: 'blurred' does not make any difference, neither defaultQuality: 100 or webpOptions: {quality: 100,},. The only difference is with and without avif. With formats: ['auto', 'webp', 'avif'], I cannot finish my build. it is like stucked (after hours of running still not finish it), while with formats: ['auto', 'webp'] I can build it without issue in around 340 seconds. Maybe something is changed with sharp and avif in the latest version? You are not using avif, so this can not be the only "issue".

robinmetral commented 2 years ago

Do you think it would be worth submitting a new issue with all these findings? ๐Ÿ‘€ it looks like a different bug, and it might have more visibility when not at the bottom of a closed issue ๐Ÿ˜›

LekoArts commented 2 years ago

It is expected (and not a bug) that with avif it takes longer and more memory. This is why we do not enable it by default. See https://github.com/gatsbyjs/gatsby/issues/30256 or https://github.com/lovell/sharp/issues/2597

Additionally, you'll have a better experience on Gatsby Cloud than on Netlify since we push the image work to cloud functions with more memory.

NickBarreto commented 2 years ago

Thanks for the info everyone. I think I will create another issue because I do think there has been a regression here.

Admittedly my site has quite a few images (~1600), but under v3 and also in my first builds of v4 the time elapsed in the logs between run queries in workers and merge worker state was virtually instant, in milliseconds.

Now the process is hanging between those two steps, with the same number of images processed, and can take upwards of 20 minutes to move on to merging the worker state.

So I've gone from full, uncached builds in Gatsby v3 taking ~13 minutes to taking so long in v4 I hit the 30 minute limit on Netlify which causes my builds to fail.

delasign commented 1 year ago

This was resolved for me when I removed: GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY=1 GATSBY_CPU_COUNT=1 from my .env file.