gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.28k stars 10.31k forks source link

First incremental build rebuilds all HTML pages #33450

Open feedm3 opened 3 years ago

feedm3 commented 3 years ago

Edit: Check my third post below. It looks like is something with the import of CSS files in Gatsby.

Preliminary Checks

Description

After running the uncached full build we are doing incremental builds for content updates. This works perfectly except for the very first incremental build. The first incremental build is rebuilding all HTML pages. Here are the logs:

1. The First build is an uncached full build.

success run static queries - 0.012s - 1/1 86.70/s
success run page queries - 239.631s - 25672/25672 107.13/s
success write out requires - 0.258s
success Building production JavaScript and CSS bundles - 63.090s
success Rewriting compilation hashes - 0.013s
success Writing page-data.json files to public directory - 159.499s - **25672/25672** 160.95/s
success Building HTML renderer - 33.815s
success Caching JavaScript and CSS webpack compilation - 227.825s
success Caching HTML renderer compilation - 26.022s
success Building static HTML for pages - 357.573s - 25672/25672 71.80/s

25672 page-data.json files are generated and the same amount of HTML pages.

2. The first incremental build rebuilds all HTML pages, although it detects the changes correctly and also only updates the necessary page-data.json files

success run page queries - 1.528s - 32/32 20.94/s
success write out requires - 0.244s
success Building production JavaScript and CSS bundles - 3.630s
success Caching JavaScript and CSS webpack compilation - 0.024s
success Rewriting compilation hashes - 0.003s
success Writing page-data.json files to public directory - 0.226s - 32/32 141.57/s
success Building HTML renderer - 10.719s
success Caching HTML renderer compilation - 2.281s
success Building static HTML for pages - 756.263s - 25672/25672 33.95/s

32 page-data.json files are correctly updated, but all 25673 HTML files are rebuilt. This is unnecessary as there wasn't any content update to that many files.

3. All upcoming builds work without any issues and are very fast

success run page queries - 1.893s - 32/32 16.91/s
success write out requires - 0.383s
success Building production JavaScript and CSS bundles - 4.657s
success Caching JavaScript and CSS webpack compilation - 0.012s
success Writing page-data.json files to public directory - 0.246s - 32/32 130.31/s
success Building HTML renderer - 11.956s
success Caching HTML renderer compilation - 0.622s
success Building static HTML for pages - 0.829s - 1/1 1.21/s

32 page-data.json files and 1 HTML page is correctly updated. All further incremental builds also work perfectly.

Current situation I don't understand what has an impact on the decision if an HTML file gets rebuild or not, besides the content. I would be super happy if there are some ideas about where the problem could be. Building HTML files within our project take a long time and we want to cut it as much as possible.

Reproduction Link

https://github.com/feedm3/gatsby-first-inc-build-bug

Steps to Reproduce

Check my third post below. It looks like it has something to do with the Gatsby core.

Unfortunately, I cannot reproduce this issue on a different repository as our setup uses a custom sourcing plugin we build with the fantastic gatsby-graphql-toolkit.

What we do within is plugin is:


exports.sourceNodes = async (gatsbyApi, pluginOptions) => {
  const { reporter, cache } = gatsbyApi;
  const lastBuildTime = await cache.get('LAST_BUILD_TIME');

  const config = await getSourcingConfig(gatsbyApi, pluginOptions);

  if (lastBuildTime) {
    // incremental build  

    // fetch all changes from the api
    const cmsUpdates = await fetchIncrementalUpdates({ pluginOptions, lastBuildTime: lastBuildTimeString });

    // convert to UPDATE or DELETE events
    const nodeEvents = cmsUpdates
      .map((event) => convertUpdateToGatsbyEvents(event))  

    await sourceNodeChanges(config, { nodeEvents });
  } else {
    // full build
    await sourceAllNodes(config);
  }

  await cache.set('LAST_BUILD_TIME', Date.now());
};

We also modify the webpack config a little bit:

exports.onCreateWebpackConfig = ({ actions, stage, plugins }) => {
  if (stage === 'build-javascript' || stage === 'develop') {
    actions.setWebpackConfig({
      plugins: [plugins.provide({ process: 'process/browser' })],
      resolve: {
        fallback: {
          fs: false,
          stream: 'stream-browserify',
        },
      },
    });
  }
};

Additional libraries we use: styled-components and typescript.

The problem occurs on the latest Gatsby 3 (3.14.1) and Gatsby 4 Version.

Expected Result

All incremental builds only update and rebuild the necessary HTML pages.

Actual Result

The first incremental build rebuilds all HTML pages, all upcoming incremental builds only rebuild the changed HTML pages.

Environment

System:
  OS: macOS 11.6
  CPU: (8) x64 Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz
  Shell: 5.8 - /bin/zsh
Binaries:
  Node: 16.7.0 - /usr/local/bin/node
  Yarn: 1.22.15 - /usr/local/bin/yarn
  npm: 7.24.0 - /usr/local/bin/npm
Languages:
  Python: 2.7.16 - /usr/bin/python
Browsers:
  Chrome: 94.0.4606.71
  Firefox: 92.0.1
  Safari: 15.0

Config Flags

{ 
    "DEV_SSR": true,
    "FAST_DEV": false
}
herecydev commented 3 years ago

I've definitely seen the same, feels like it needs a good 2 or 3 builds to "settle" before finally reporting no changes.

feedm3 commented 3 years ago

@herecydev thanks for pointing that out! I actually check a different Gatsby project we have and the result is the same: The first incremental build rebuilds all HTML pages, all upcoming incremental builds only rebuild the ones that have changed.

The project is our company website, the "bug" is reproducible: https://github.com/satellytes/satellytes.com

We just never noticed it there as it only has 37 pages.

1. Build is a full build

success run static queries - 0.014s - 2/2 147.58/s
success run page queries - 5.221s - 37/37 7.09/s
success write out requires - 0.012s
success Building production JavaScript and CSS bundles - 23.393s
success Rewriting compilation hashes - 0.035s
success Writing page-data.json files to public directory - 0.237s - 37/37 156.22/s
success Building HTML renderer - 6.240s
success Building static HTML for pages - 1.362s - 37/37 27.16/s
...
info Done building in 64.557227392 sec

After that, the first incremental builds rebuilds all HTML pages

success run page queries - 1.414s - 27/27 19.10/s
success write out requires - 0.006s
success Building production JavaScript and CSS bundles - 1.544s
success Caching JavaScript and CSS webpack compilation - 0.005s
success Rewriting compilation hashes - 0.003s
success Writing page-data.json files to public directory - 0.028s - 27/27 973.21/s
success Caching HTML renderer compilation - 0.001s
success Building HTML renderer - 0.791s
success Building static HTML for pages - 0.650s - 37/37 56.95/s
...
info Done building in 14.789357687 sec

All upcoming incremental builds don't rebuild the HTML files

success run page queries - 1.382s - 27/27 19.54/s
success write out requires - 0.005s
success Building production JavaScript and CSS bundles - 1.398s
success Caching JavaScript and CSS webpack compilation - 0.006s
success Writing page-data.json files to public directory - 0.029s - 27/27 940.10/s
success Caching HTML renderer compilation - 0.001s
success Building HTML renderer - 0.778s
info There are no new or changed html files to build.
...
info Done building in 11.934242094 sec

Our setup here is nearly the same as on the project my initial report is based on, but only with sourcing the file system (and an API request in createPages).

feedm3 commented 3 years ago

I did some more research and it turns out: This bug is even within the Gatsby Starter Blog (I didn't test the other starters): https://github.com/gatsbyjs/gatsby-starter-blog

I did 3 builds in a row without changing anything.

First build Fresh build, all HTML files are created.

success run page queries - 0.482s - 7/7 14.52/s
success write out requires - 0.012s
success Building production JavaScript and CSS bundles - 9.633s
success Rewriting compilation hashes - 0.034s
success Writing page-data.json files to public directory - 0.206s - 7/7 34.06/s
success Building HTML renderer - 2.400s
success Building static HTML for pages - 0.166s - 7/7 42.04/s

Seconds build First incremental build, but all HTML pages are rendered again.

success run page queries - 0.015s - 1/1 66.59/s
success write out requires - 0.009s
success Building production JavaScript and CSS bundles - 0.799s
success Caching JavaScript and CSS webpack compilation - 0.008s
success Rewriting compilation hashes - 0.007s
success Writing page-data.json files to public directory - 0.008s - 1/1 124.87/s
success Building HTML renderer - 1.331s
success Caching HTML renderer compilation - 0.206s
success Building static HTML for pages - 0.194s - 7/7 36.06/s

Third build Second incremental build: Now, only the HTML of the changed pages is rebuilt.

success run page queries - 0.031s - 1/1 32.20/s
success write out requires - 0.012s
success Building production JavaScript and CSS bundles - 2.209s
success Caching JavaScript and CSS webpack compilation - 0.013s
success Writing page-data.json files to public directory - 0.048s - 1/1 20.93/s
success Building HTML renderer - 3.309s
success Caching HTML renderer compilation - 0.411s
success Building static HTML for pages - 0.167s - 1/1 6.00/s

The example Typescript page shows the build time. This is why every incremental build is rebuilding 1 HTML page, as the build time changes with every incremental build. To check if my assumption is correct, I removed the Typescript file and checked again. Turns out: My assumptions were correct. Also, the second incremental build isn't rebuilding anything!

I put the build time query to a different JS file to see if this has an impact on the generated HTML files. The first rebuild built all HTML pages and the seconds one only rebuilt 1 HTML page. The only difference now is that there is no more Typescript file involved.

My result for now: Typescript breaks the first incremental build to rebuild all HTML files.

feedm3 commented 3 years ago

Ok, it looks like it's not Typescript. I created a small reproduction repo without any dependencies except for Gatsby and React. The issue still appears. I may just have forgotten to delete the .cacheor public folder during all these builds and testing. I couldn't reproduce a working first incremental build on my machine again.

Here is the Github Repo: https://github.com/feedm3/gatsby-first-inc-build-bug Here is the Codesandbox: https://codesandbox.io/s/gatsby-first-inc-build-bug-83pbu

You can run yarn build in the bottom right terminal on Codesandbox, but sometimes the caching of the node_modules, .cache and public folder is somehow intransparent. To make sure your results are accurate better download the code and run it locally.

I did a full build, copied the public folder to public-first-build and run the build again to see which files are different. It's only the /page-data/app-data.json file. All other files are identical:

< {"webpackCompilationHash":"2c00ac4d7abf97827f69"}
---
> {"webpackCompilationHash":"4457f437159d17962896"}

As expected, running more incremental builds doesn't change the app-data.json or any other file anymore. It's only the first incremental build.

Update

I removed the static query from a page component. There are now no more Gatsby queries anymore. Still, the problem occurs.

Then I removed the one and only import of a .css file. Now, the first incremental build is working! Without a .css file (or an import of a .css file, I wasn't able to reproduce the bug. It seems the issue comes with a css-loader.

styled-components experiment

The first incremental build is working when styles are only used with styled-components and no CSS files. I created a branch with the implementation: https://github.com/feedm3/gatsby-first-inc-build-bug/tree/styled-components

This is not really a solution, as importing CSS files is essential. But it shows that it's not the styles per se, it's the CSS files.

sidharthachatterjee commented 2 years ago

Okay, we took a look into this and it seems to be the mini-css-extract-plugin.

Interestingly, this only seems to occur when the mini-css-extract-plugin is used in conjunction with Webpack cache

Disabling cache in https://github.com/gatsbyjs/gatsby/blob/8ff9cc30edd8db3fcbb3726cb3441a768299abbd/packages/gatsby/src/utils/webpack.config.js#L847 or removing the aforementioned plugin in https://github.com/gatsbyjs/gatsby/blob/8ff9cc30edd8db3fcbb3726cb3441a768299abbd/packages/gatsby/src/utils/webpack-utils.js#L592 seems to "fix" it.

We need to further investigate what it is in the supposed interplay between the cache and the plugin.

Thanks for your excellent analysis and helpful reproduction. Cheers!

LekoArts commented 2 years ago

Sadly we had to revert this PR and open https://github.com/gatsbyjs/gatsby/pull/34413 as the changes we made introduced breaking changes when using inline loaders, e.g. !raw-loader!src/stuff.svg. webpack changed that syntax.

As the issue could only be fixed by updating the packages I'll not re-open this issue as it can't be fixed until Gatsby 5. I've made TODO notes in our code and we'll bump those deps then, but sadly until now it'll be like this.

fturmel commented 1 year ago

@LekoArts Did this fix ever make it to Gatsby 5 after all? I believe I'm still seeing the described problem in 5.2 where webpackCompilationHash changes between a clean build and first incremental build.

fostimus commented 1 year ago

I've also been running into this - the only thing that changes per page is webpackCompilationHash

LekoArts commented 1 year ago

@fturmel No, it didn't make the cut for Gatsby 5. For the time we had left just re-applying the first PR would have been a big breaking change and we wanted to make the migration smooth. Maybe the PR can be introduced without a breaking change, but that would need more investigation. Any help is appreciated! :)

finematterdave commented 1 year ago

Hi @LekoArts , did this progress any further on a branch/in a roadmap? Happy to have a look and help out, impacts us quite a lot (and is more acute on larger/higher volume of change sites), just seeing if theres anything half done already etc other than the reverted PR. Thanks!

natapg commented 12 months ago

@LekoArts any news on this? our build goes from 10m to 2m using cache, but this outdated version of mini-css-extract-plugin sometimes breaks the builds so we have to clean each build

robclancy commented 3 weeks ago

All I can say after coming here and seeing this is... fuck netlify.