gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.28k stars 10.31k forks source link

Build stuck at running jobs (image transformation) #34051

Open NickBarreto opened 2 years ago

NickBarreto commented 2 years ago

If you're coming new to this issue, please see this first: https://github.com/gatsbyjs/gatsby/issues/34051#issuecomment-979882343


Preliminary Checks

Description

Gatsby's build process is hanging and not completing. I suspect the issue is with Sharp, as my site has quite a few images, and I saw this brought up in a previous issue, #33557.

When I upgraded to v4 I initially had no issues. However, the next day my builds all started going exceeding Netlify's maximum build time of 30 minutes.

I mentioned this problem in the thread to the other issue, as others apparently had the same problem where run queries in workers seems to take longer than expected.

This issue is difficult to reproduce because I think in part it is to do with the scale of my site, which is moderately large and has ~1600 images. There must be something that isn't quite right in the worker process because my builds on netlify went from roughly taking around 13 or 14 minutes, to exceeding the build limit every time.

To try and diagnose the issue I tried a local build, which while it took a long-ish time, did actually complete

Since @LekoArts suggested that Gatsby Cloud's build process is better optimised for processing images, I thought I'd give that a go.

After trying out a build in Gatsby Cloud, I had no build problems at all and the whole site build with a clear cache in 7 minutes. OK, I thought, seems like the problem isn't so much with Gatsby, but in how Netlify is interacting with v4's worker process.

However, the next push I ran into the problem once again, this time in Gatsby Cloud. The bottom end of Gatsby Cloud's logs are useful, because they give me a little more information than Netlify:

17:38:38 PM:
info Total nodes: 7987, SitePage nodes: 1695 (use --verbose for breakdown)

17:38:38 PM:
success Checking for changed pages - 0.001s

17:38:38 PM:
success onPreExtractQueries - 0.000s

17:38:38 PM:
success Cleaning up stale page-data - 0.024s

17:38:38 PM:
success createPages - 1.351s

17:38:40 PM:
success extract queries from components - 1.596s

17:38:40 PM:
success write out redirect data - 0.004s

17:38:40 PM:
success onPostBootstrap - 0.046s

17:38:40 PM:
success write out requires - 0.030s

17:38:40 PM:
info bootstrap finished - 48.635s

17:39:15 PM:
warning warn - You have enabled the JIT engine which is currently in preview.

17:39:15 PM:
warning warn - Preview features are not covered by semver, may introduce breaking changes, and can change at any time.

17:39:15 PM:
warning ⠀

17:39:22 PM:
success Building production JavaScript and CSS bundles - 42.093s

17:39:24 PM:
 [webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)

17:39:24 PM:
 [webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)

17:39:24 PM:
 [webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)

17:39:59 PM:
success Building Rendering Engines - 37.719s

17:40:13 PM:
success Building HTML renderer - 13.051s

17:40:13 PM:
success Execute page configs - 0.039s

17:40:15 PM:
success Caching Webpack compilations - 0.001s

17:40:15 PM:
success Validating Rendering Engines - 2.094s

17:40:39 PM:
success run queries in workers - 23.276s - 1662/1662 71.40/s

17:45:38 PM:
warning This is just diagnostic information (enabled by GATSBY_DIAGNOSTIC_STUCK_STATUS_TIMEOUT):

17:45:38 PM:
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"

17:45:38 PM:
Gatsby is in "IN_PROGRESS" state without any updates for 300.000 seconds. Activities preventing Gatsby from transitioning to idle state:

17:45:38 PM:
Process will be terminated in 1500.000 seconds if nothing will change.

17:45:38 PM:
- Activity "Running jobs v2" of type "hidden" is currently in state "IN_PROGRESS"

18:10:38 PM:
ERROR Terminating the process (due to GATSBY_WATCHDOG_STUCK_STATUS_TIMEOUT):

18:10:38 PM:
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"

18:10:38 PM:
Gatsby is in "IN_PROGRESS" state without any updates for 1800.000 seconds. Activities preventing Gatsby from transitioning to idle state:

18:10:38 PM:
- Activity "Running jobs v2" of type "hidden" is currently in state "IN_PROGRESS"

The fact that a full, uncached build on Gatsby Cloud can run in 7 minutes, suggests to me that actually the issue isn't one of scale, but that the worker process is hanging, but only sometimes.

Is it to do with incremental builds? Maybe. I am using the preserved download cache, because as I said my site has quite a few images which are coming from a custom source plugin (which is relatively simple, and contains all the image links from AWS that are passed over to createRemoteFileNode).

To test things out once I had the first timeout on Gatsby Cloud, I tested a manual deploy without clearing the cache. I was hoping the process would hang again so I'd know the issue was with the cache and incremental builds, but alas, it did not. The whole build was completed in 6 minutes. Strangely, the issue does appear to occur on Netlify more frequently than not, and happens more occasionally in Gatsby Cloud. It may be to do with build process resources, because I just signed up to Gatsby Cloud, and so am in the free preview of performance builds.

Are there other diagnostic tools I can use to more closely inspect the build process? How would I be able to see which process is failing or never finishing?

Reproduction Link

I can't seem to reproduce this error as it is intermittent

Steps to Reproduce

  1. Attempt to build site with gatsby build in either Netlify or Gatsby Cloud
  2. Sometimes, the build never finishes

Expected Result

gatsby build should eventually finish and build the site

Actual Result

The state run queries in workers never finishes/moves on to merge worker state, the build eventually times out and fails.

Environment

My local environment isn't really the issue, builds have failed in both Netlify and Gatsby Cloud with this problem.

However, this is my local env:

  System:
    OS: macOS Mojave 10.14.6
    CPU: (4) x64 Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 16.1.0 - /usr/local/bin/node
    npm: 8.1.4 - /usr/local/bin/npm
  Languages:
    Python: 3.9.5 - /usr/local/opt/python/libexec/bin/python
  Browsers:
    Chrome: 95.0.4638.69
    Firefox: 94.0.1
    Safari: 14.1.2
  npmPackages:
    gatsby: ^4.1.6 => 4.2.0 
    gatsby-plugin-gdpr-cookies: ^2.0.8 => 2.0.8 
    gatsby-plugin-image: ^2.1.3 => 2.2.0 
    gatsby-plugin-loadable-components-ssr: ^4.1.0 => 4.1.0 
    gatsby-plugin-local-search: ^2.0.1 => 2.0.1 
    gatsby-plugin-netlify: ^4.0.0-next.0 => 4.0.0-next.0 
    gatsby-plugin-netlify-cms: ^6.1.0 => 6.2.0 
    gatsby-plugin-postcss: ^5.1.0 => 5.2.0 
    gatsby-plugin-react-helmet: ^5.1.0 => 5.2.0 
    gatsby-plugin-sharp: ^4.1.4 => 4.2.0 
    gatsby-remark-copy-linked-files: ^5.1.0 => 5.2.0 
    gatsby-remark-images: ^6.1.4 => 6.2.0 
    gatsby-remark-relative-images: ^2.0.2 => 2.0.2 
    gatsby-remark-responsive-iframe: ^5.1.0 => 5.2.0 
    gatsby-remark-smartypants: ^5.1.0 => 5.2.0 
    gatsby-source-filesystem: ^4.1.3 => 4.2.0 
    gatsby-transformer-remark: ^5.1.4 => 5.2.0 
    gatsby-transformer-sharp: ^4.1.0 => 4.2.0 
  npmGlobalPackages:
    gatsby-cli: 4.2.0
    gatsby: 3.5.0

Config Flags

PRESERVE_FILE_DOWNLOAD_CACHE: true

DennisKraaijeveld commented 2 years ago

I have exactly the same going on. I am on Standard/Standard plan and since 2 days fresh build will take forever. Sometimes it will finish after 25-30 minutes.

10:40:15 AM:
info [gatsby-plugin-perf-budgets] hooking into webpack

10:40:25 AM:
warning warn - Preview features are not covered by semver, may introduce breaking changes, and can change at any time.

10:40:25 AM:
warning warn - You have enabled the JIT engine which is currently in preview.

10:40:25 AM:
warning ⠀

10:41:05 AM:
Webpack Bundle Analyzer saved report to /usr/src/app/www/public/report.html

10:41:05 AM:
success Building production JavaScript and CSS bundles - 50.349s

10:41:32 AM:
success Building HTML renderer - 27.319s

10:41:32 AM:
success Caching Webpack compilations - 0.001s

10:41:32 AM:
success Execute page configs - 0.093s

10:41:34 AM:
success run queries in workers - 1.793s - 307/307 171.24/s

10:46:34 AM:
warning This is just diagnostic information (enabled by GATSBY_DIAGNOSTIC_STUCK_STATUS_TIMEOUT):

10:46:34 AM:
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"

10:46:34 AM:
Gatsby is in "IN_PROGRESS" state without any updates for 300.000 seconds. Activities preventing Gatsby from transitioning to idle state:

10:46:34 AM:
Process will be terminated in 1500.000 seconds if nothing will change.

I wanted to open an issue, but I guess we have something similar.

I am also using createRemoteFileNode to download and optimise remote images. I will run a build without that to see what happens.

LekoArts commented 2 years ago

@NickBarreto @DennisKraaijeveld can you both please post the URL to a failed build where you see this? Then we can on our side check it out.

DennisKraaijeveld commented 2 years ago

On my side it is not failing 100% of the times. But buildtimes have gone from 5 to 24-30 minutes. This build is now going on: https://www.gatsbyjs.com/dashboard/c9ba2b9c-76c6-4e2e-94b6-047574fb963f/sites/d18769be-381e-458e-b4fb-d59bbc168935/builds/88ec953a-2dac-41da-8696-f07b589001cf/details

this one did finish after 25 minutes with the same problem: https://www.gatsbyjs.com/dashboard/c9ba2b9c-76c6-4e2e-94b6-047574fb963f/sites/d18769be-381e-458e-b4fb-d59bbc168935/builds/fd5afb99-0d11-43c8-af8a-0624b7a748e3/details

@LekoArts

NickBarreto commented 2 years ago

Sure thing.

Here's a Gatsby Cloud build that failed with this error: https://www.gatsbyjs.com/dashboard/e1ae5b97-e312-4fe3-88f2-5f4f81ac0d9d/sites/1703f5eb-05d0-46ad-8fdf-e1f7129e83b7/builds/74b00a67-fbe3-4048-9f78-8150895fc298/details

This is the exact same build, triggered manually, not clearing cache, immediately after, which built successfully in 6 minutes: https://www.gatsbyjs.com/dashboard/e1ae5b97-e312-4fe3-88f2-5f4f81ac0d9d/sites/1703f5eb-05d0-46ad-8fdf-e1f7129e83b7/builds/beb61dcb-be64-4cc5-844a-d6cd3f1a2326/details

There were no changes in the codebase between these two builds, but one failed and the other did not.

It's also not failing every time for me on Gatsby Cloud, although as I said on Netlify it is nearly every time. I suspect that may be to do with resources in the build machine.

DennisKraaijeveld commented 2 years ago

@LekoArts I scanned through my builds, and the build without the remote images (onCreateNode, createSchemaCustomization) did run for 4 minutes.. Might be helpful information

EDIT: Never mind. Found a build yesterday without remote images, building forever as well with exactly the same issues: https://www.gatsbyjs.com/dashboard/c9ba2b9c-76c6-4e2e-94b6-047574fb963f/sites/d18769be-381e-458e-b4fb-d59bbc168935/builds/af4c2d52-8f48-4fba-97b6-4c1630032b3a/details

ashhitch commented 2 years ago

Also seeing this issue: https://www.gatsbyjs.com/dashboard/3d3630a8-7adf-4a87-85eb-98aef3eea470/sites/1d97bd5a-413f-4e88-965a-87d672349808/builds/4ea4bf44-d89b-4210-95af-547795bc8e78/details#rawLogs

LekoArts commented 2 years ago

Thanks for providing the URLs. We've looked at the builds from @NickBarreto @DennisKraaijeveld and in summary these are the findings:

NickBarreto commented 2 years ago

Hi @LekoArts, thanks so much for the information.

What would you advise as a next step? Watch this PR until it is merged into a release, then upgrade to that release and do a few further builds to gather more diagnostic details?

Is there any other way in which I could contribute?

startinggravity commented 2 years ago

Following because I get this problem a lot.

pieh commented 2 years ago

@NickBarreto (and other folks following this issue)

What would you advise as a next step? Watch this PR until it is merged into a release, then upgrade to that release and do a few further builds to gather more diagnostic details?

I did publish "canary" (gatsby@alpha-job-progress) from the PR branch and we are running that (+ some additionaly debugging code on top) internally with test site that we are able to reproduce the issue (eventually, as it does need multiple runs to eventually reproduce the problem). You can use that canary release yourself, but it really won't help much for unstucking builds (it will just show additional progress bar in logs tab)

So in short, we don't need more information from you folks (at least about being stuck on image generation in Gatsby Cloud), we already can reproduce and are in process of tracking down the problem and we will post update here once we find the problem, implement a fix and have reasonably high level of confidence that the fix is correct ( we can never be 100% sure due to intermittent nature of the problem )

pieh commented 2 years ago

Oh, and more thing:

We also found that diagnostic message printing information about "activities" in progress is not always fully correct.

warning This is just diagnostic information (enabled by GATSBY_DIAGNOSTIC_STUCK_STATUS_TIMEOUT):
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"

We do see messages like this mentioning only build when in fact there is also Running jobs one (which I did chase thinking there are 2 problems of stuck builds originally). What we found in builds we were checking is that all of them were related to jobs/image processing, but some of them didn't mention jobs which is separate bug, but not root of the stuck builds issue.

pieh commented 2 years ago

Yesterday we published new version of Gatsby Cloud build runner image with fixes, migrated our test site to use it and were monitoring behaviour overnight. We didn't see problems anymore on our test site - it did handle over 300000 jobs successfully in that time (before that fix, it would get stuck at most around 60000 jobs, but more often it was getting stuck much quicker than that)

We are rolling out this update to all sites now. Please note that migration won't happen if the site is busy (like constantly rebuilding), so good way to give a chance to migrate is to temporarily disable builds in Site Settings -> Builds (for ~5 minutes) and re-enable them after that.

DennisKraaijeveld commented 2 years ago

Thanks! @everyone :)

buzinas commented 2 years ago

@pieh Thanks for the update. And how about local builds? I have the same problem running this gatsby build on my codespaces machine (32GB).

Gatsby Cloud is still failing for me as well:

https://www.gatsbyjs.com/dashboard/1dfaf52f-9c4f-46c4-9410-6f9966814f9d/sites/9a4191f0-68ba-4271-8b61-fc8572eaf75b/builds/9bf0ccc4-6d08-4f8d-91b1-290be83ce825/details#all

askibinski commented 2 years ago

I have read the thread but I'm not using Gatsby Cloud.

Currently migrating from v3 to v4 and testing everything on local and now this quite often happens on gatsby build.

success run queries in workers - 14.927s - 96/96 6.43/s
success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 50.515s - 206/206 4.08/s
not finished Merge worker state - 0.276s

Never had issues before, it's a moderate site, definitely nothing large. Is there a way to get more debug info here what's going on?

Edit: played around with NODE_OPTIONS=--max_old_space_size and GATSBY_CPU_COUNT on my local 32Gb laptop, but without success.

askibinski commented 2 years ago

I got more debug information when I upgraded gatsby-cli to 4.4.0:

(...)
success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 41.998s - 206/206 4.91/s

ERROR 

Assertion failed: all worker queries are not dirty (worker #3)

  Error: Assertion failed: all worker queries are not dirty (worker #3)

  - queries.ts:391 assertCorrectWorkerState
    [quietquality_gatsby]/[gatsby]/src/redux/reducers/queries.ts:391:13
(...)
pieh commented 2 years ago

@askibinski if you are hitting this issue locally - could you manually edit a file in your node_modules - node_modules/gatsby/dist/redux/reducers/queries.js - if you are on 4.4 line should be line 439

And instead of just

throw new Error(`Assertion failed: all worker queries are not dirty (worker #${workerId})`);

Let's add information on our state that assertion fails on:

throw new Error(`Assertion failed: all worker queries are not dirty (worker #${workerId})\n${require(`util`).inspect(queryStateChunk, { depth: Infinity })}`);

This should additionally print information like one below alongside assertion error:

{
    byNode: Map(2) {
      'Site' => Set(1) { '/using-typescript/' },
      'SiteBuildMetadata' => Set(1) { '/using-typescript/' }
    },
    byConnection: Map(0) {},
    queryNodes: Map(0) {},
    trackedQueries: Map(1) { '/using-typescript/' => { dirty: 0, running: 0 } },
    trackedComponents: Map(0) {},
    deletedQueries: Set(0) {},
    dirtyQueriesListToEmitViaWebsocket: []
  }

I currently have no idea how we end up in situation like that. Possibly something fails earlier and we swallow/ignore error? Or maybe we have some stale state?

askibinski commented 2 years ago

@pieh Thanks! Yeah I was going that route but your snippet really helped:

So apparantly I still had an old Image (image.js) component laying around from an earlier version/iteration which was used in one place and that debug info showed me:

    trackedQueries: Map(6) {
      'sq--src-components-header-header-js' => { dirty: 0, running: 0 },
      'sq--src-components-meta-js' => { dirty: 0, running: 0 },
      'sq--src-components-blocks-latest-posts-js' => { dirty: 0, running: 0 },
      'sq--src-components-media-image-js' => { dirty: 4, running: 0 },
      'sq--src-components-forms-ebook-form-js' => { dirty: 0, running: 0 },
      'sq--src-components-node-body-js' => { dirty: 0, running: 0 }
    },

Adding that debug info by default might help a lot of people migrating and running into an issue like this.

bkaldor commented 2 years ago

We're trying to upgrade to 4.4 from 3 as well and are running into this exact issue - both in Gatsby Cloud https://www.gatsbyjs.com/dashboard/e156da66-cda0-4df5-b3c0-a7fdca6bf65e/sites/43774e74-f15a-4923-b6f7-d215d0ba104b/builds/e82328f4-29ff-44bc-945e-81b886afd8f8/details#rawLogs and locally:

success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 211.318s - 2175/2175 10.29/s ⠋ Merge worker state

ERROR

Assertion failed: all worker queries are not dirty (worker #3)`

buzinas commented 2 years ago

@pieh any thoughts on https://github.com/gatsbyjs/gatsby/issues/34051#issuecomment-990614035?

Right now I'm using a setting with [WEBP] only in order to build my website while it's in development, but we plan to go live in January, and I need the fallback images. I tried the default value ([AUTO, WEBP] if I'm not wrong), and I also tried [WEBP, PNG], [WEBP, NO_CHANGE], but had no success. They all time out after "success run queries in workers".

askibinski commented 2 years ago

@bkaldor and others finding this issue: I guess there might be different reasons the build stops/stalls and this issue can get messy. I summarized below:

buzinas commented 2 years ago

@askibinski As I said in my two last comments, the image processing on Gatsby Cloud is not fixed. I'm facing timeouts every time I use WEBP with some fallback (NO_CHANGE, AUTO, PNG or JPG).

And I'm also facing the same issue on local Gatsby build (GitHub Codespaces). I'll try to play with these environment variables you suggested in order to fix the local issue, but the Gatsby Cloud issue still persists.

startinggravity commented 2 years ago

I resolved the problem I was having with Gatsby Cloud build fails during image processing. As I read through the comments here, it appears that my situation could be different than most, but I figured my situation might be helpful to someone with a similar problem who landed on this issue discussion.

Gatsby Cloud did not specifically say why the build was stopping, other than to give an obscure message: "Failed to validate error Error [ValidationError]: "name" is not allowed.” Eventually, I discovered three image files were being referenced in my Drupal backend's database but were missing from the files directory.

When I removed the database references, I stopped getting stuck build attempts. Oddly, I started using Gatsby a year ago and the problem didn't appear until a few weeks ago, even though the files had always been missing from my backend.

ghost commented 2 years ago

Currently migrating from v3 to v4 on local and I also get this error on gatsby build. I added the snippet that @pieh suggested but the files that were flagged as dirty didn't seem like they had anything fishy in them. As others have mentioned, this happens intermittently. Other times, I get this generic error:

There was an error in your GraphQL query:

Too many requests, retry after 20ms.

and

An error occurred during parallel query running.
Go here for troubleshooting tips: https://gatsby.dev/pqr-feedback

Error: Worker exited before finishing task
pieh commented 2 years ago

For the Assertion failed: all worker queries are not dirty errors I suspect it can be combination of lazy node creation in resolvers (in particular lazy file downloading) used in one query (it likely won't be marked in any way) and other query having query for allFile (or something similiar).

The way it could happen is that allFile is executed first (at that time not all "lazy file downloads" might happen yet) and later there is a query that lazy download a file and create a node for it which would mark query with allFile as dirty (after it did run). But this is just guess work and it might be totally not what's happening.

@sean-binary errors about

There was an error in your GraphQL query:

Too many requests, retry after 20ms.

could fit into my hypothesis above, but it all depends on what the query that is marked as dirty looks like.

So, if possible, please provide some details about "dirty" queries (ideally full actual query if possible) and some information about used plugins or custom resolvers that would lazy download files. Ideally access to site code to reduce guess work.

ghost commented 2 years ago

@pieh Some of the examples of the "dirty queries" would be as below:

<StaticQuery
    query={graphql`
        query {
            view_email: file(relativePath: { eq: "view-email.png" }) {
                ...fadeIn
            }
        }
    `}
/>

and

const query = graphql`
    query {
        dbot_strategy: file(relativePath: { eq: "dbot-strategy.png" }) {
            ...fadeIn
        }
        dbot_build_strategy: file(relativePath: { eq: "dbot-build-strategy.png" }) {
            ...fadeIn
        }
        dbot_maximise_profits: file(relativePath: { eq: "dbot-maximise-profits.png" }) {
            ...fadeIn
        }
        dbot_track_your_performance: file(relativePath: { eq: "dbot-track-your-performance.png" }) {
            ...fadeIn
        }
        dbot_get_integrated_help: file(relativePath: { eq: "dbot-get-integrated-help.png" }) {
            ...fadeIn
        }

where fragments were defined separately as

const fadeIn = graphql`
    fragment fadeIn on File {
        childImageSharp {
            gatsbyImageData(
                formats: [AUTO, WEBP]
                layout: CONSTRAINED
                breakpoints: [360, 992]
                placeholder: NONE
            )
        }
    }
`

Custom resolvers or queries used can be found here. The site code can be accessed there as well.

pieh commented 2 years ago

@sean-binary Thanks for providing that info. While your queries don't have allFile fields, file one can also result in similar behaviour when there is no File node found that match filter/selector - https://github.com/gatsbyjs/gatsby/blob/63207a2cd340890f8de38775e13247ea55f5731b/packages/gatsby/src/schema/node-model.js#L381-L389

My suspicion is that there at least one of file field in queries that are getting marked as dirty that are not finding results (which means that any time new File node is created the query is marked as dirty) coupled with usage of gatsby-source-directus imageFile field (which lazily downloads images - https://github.com/directus/directus/blob/cb1bfee3fd42731a8d831cd8a0a1f9a9dd69c4cb/packages/gatsby-source-directus/gatsby-node.js#L79-L109 ) triggers this behaviour.


In your gatsby-config you have gatsby-source-filesystem configured as

        {
            resolve: 'gatsby-source-filesystem',
            options: {
                name: 'images',
                path: `${__dirname}/src/images/common`,
            },
        },

Query 1:

view_email: file(relativePath: { eq: "view-email.png" })

In repo there is view-email.png file, but it's in src/images/common/sign-up/view-email.png, so relativePath should be sign-up/view-email.png (so that could fit into my updated hypothesis)

Query 2 (just one example, but could be more there given multiple file fields):

dbot_strategy: file(relativePath: { eq: "dbot-strategy.png" })

In repo there is src/images/common/dbot/dbot-strategy.png so relativePath should be dbot/dbot-strategy.png.

Alternatively you could switch to using name (or other identifier) instead of relativePath but it might produce weird results if you ever have multiple files with same names just in diferent directories, so I'd suggest sticking to relativePath and adjusting filters/selectors to actually match.

pieh commented 2 years ago

Above is workaround for the current problem - I'll try to think about systemic solution for this so users don't get those weird errors (for example not allowing to queries to be marked as dirty within same block of query running, or maybe instead of just invariant causing failures try to rerun them with information that something might be wrong)

ghost commented 2 years ago

Thank you very much with the assistance on my query @pieh and great spot on the relative paths 😅 You are probably right on that, though I am curious how the change from V3 to V4 had cause that to become an issue. Let me try the workaround and revert back here.

In the mean time, would the gatsby-source-directus imageFile field as pointed out here potentially cause issues on this as well? I noticed in the migration guide, specifically here that there's a new way to call createRemoteFileNode and the current source plugin seems to be deprecated.

pieh commented 2 years ago

though I am curious how the change from V3 to V4 had cause that to become an issue. Let me try the workaround and revert back here.

The error (technically assertion) happens when running queries in multiple cores - v3 by default didn't use it (but you could enable it with PARALLEL_QUERY_RUNNING flag that was added in gatsby@3.10.0).

So in v3 queries most likely been marked as dirty after query running, but it wasn't fatal (and it wasn't even checked).

Maybe the solution for this will be just disabling this check, which would return to v3 behaviour - check was added because there is just more moving pieces with multi-core handling and lack of any kind of integrity checks like that seems quite dangerous as it could lead to builds that don't fail but results being actually borked. I'd rather try to figure out a way to try rerunning queries dirty queries (probably limit to just few times, to not end up in never-ending query running cycle and just fail if it's more than ~5 times or something like that)

In the mean time, would the gatsby-source-directus imageFile field as pointed out here potentially cause issues on this as well? I noticed in the migration guide, specifically here that there's a new way to call createRemoteFileNode and the current source plugin seems to be deprecated.

This still "works" but it can cause problems like one you are seeing and that's why we tried to strongly discourage this pattern. Making that pattern just not work anymore or break completely would likely cause much more problems for the users as this is quite popular pattern (and idea is great because doing "lazy" downloads on demand vs downloading files up front is generally more friendly as site might just use very small subset of files uploaded in CMS)

ghost commented 2 years ago

@pieh I've managed to migrate over to v4 after numerous attempts with the help of your suggestions and also what @askibinski had suggested in this message. Here's the breakdown of what I had encountered whilst trying to switch over.

Too many requests, retry after 20ms.

- Such queries are as below:
```jsx
// OLD
imageFile {
    childImageSharp {
        fluid(quality: 100) {
            ...GatsbyImageSharpFluid_withWebp
        }
    }
}

// NEW
imageFile {
    childImageSharp {
        gatsbyImageData(quality: 100)
    }
}

It currently works but it isn't yet ideal and I would still be testing it further.

github-actions[bot] commented 2 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

ghost commented 2 years ago

Keeping this issue open for others and I would also like to follow the development of it.

sferg989 commented 2 years ago

I am having This same issue. Here is a link to the build failurebuild fail

github-actions[bot] commented 2 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

smujmaiku commented 2 years ago

I am also having issues with an upgrade to Gatsby 4 and would like to see movement on this in the future. Perhaps some of this can be addressed in the upcoming Gatsby Conf titled Upgrading from Gatsby 2 to Gatsby 4 — It’s About Time

bakervin commented 2 years ago

This is still a massive blocker for my team in order for us to move from Gatsby 3 to Gatsby 4. How does one go about getting traction towards this issue, which appears to be a larger impacting issue than just our site?

buzinas commented 2 years ago

We had to start using Shopify CDN instead of downloading the images due to this issue.

SebastianMera commented 2 years ago

I get a similar error: not finished Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 148.074s. Any ideas please?

seanho96 commented 2 years ago

@SebastianMera There could be lots of reason as stipulated here earlier. You could start by limiting the CPU usage by changing your build script to GATSBY_CPU_COUNT=2 gatsby build and see if there's any difference.

SebastianMera commented 2 years ago

@seanho96 I've tried different proposed solutions from this thread and nothing seems to work. The last thing that I should try is to check every graphQL query and make sure it is not using deprecated technology

thisispvb commented 2 years ago

Getting the same issue as well, sometimes I have to re-run the build 4-5 times until one succeeds (running with GATSBY_CPU_COUNT=1 on a single-CPU instance).

haneabogdan commented 2 years ago

Having the same issue. After playing with NODE_OPTIONS=--max_old_space_size and GATSBY_CPU_COUNT environment variables, got the gatsby-plugin-sharp.IMAGE_PROCESSING job to 99%, but Validating Rendering Engines still fails. Any other ideas on how to fix it?

SebastianMera commented 2 years ago

@LekoArts I have come to the conclusion together with other teammates working on the same project that we cannot even make a proper assessment why this problem occurs and we are not able to detect if it's a resource problem, a graphQL problem or if it has something to do with gatsby internals.

I see that many people are struggling with this error. Hence, there may be a common drawback of the new version. Therefore, it should be handled with high priority.

sensedrive commented 2 years ago

I am running into the same issue. I also played with the .env variables NODE_OPTIONS / GATSBY_CPU_COUNT but had no luck to get my build running.

I am hosting my application on a ubuntu 20.04 server with dokku/docker, 8G of memory and 4 cores. Also Gatsby Cloud is giving my the same output. Just my M1 Mac builds the page locally without any problems.

Since downgrading to GATSBY V3 it works again.

buzinas commented 2 years ago

For me, this problem happened for a long time, tried directly with Gatsby Cloud to work on a solution, but they basically shrugged to the issue. I could get to the bottom of the problem: it happens on gatsby-plugin-sharp, when there are too many images to process. I tried all these variables, I tried to change the underlying code to throttle the image processing etc, and couldn't get to any point I was happy with.

Then I decided to just stick with using Shopify's CDN instead of processing the images and never looked back. But I know that this is a bummer and not everyone can "disable" image processing.

ansmlc commented 2 years ago

I'm experiencing a similar issue. Builds kept failing and than we tried to optimize queries. Now it builds but it went from 8min to 30min on Netlify. We're also considering disabling Gatsby's image processing entirely.

SebastianMera commented 2 years ago

@buzinas Until this problem is solved I will need to research how this problem occurs and try to either bypass it internally or make a workaround for the configuration that is used.

wardpeet commented 2 years ago

Any people on Gatsby Cloud who are having this issue that can send me an email at ward@gatsbyjs.com with your user email and site name so we can investigate :pray:

SebastianMera commented 2 years ago

@wardpeet Hello, but what if the site is not even deployed on gatsby cloud? For production the site is deployed on AWS but the builds also fail locally. :<