gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.28k stars 10.31k forks source link

New Gatsby type inference is slow on 60k pages #12692

Closed bennetthardwick closed 5 years ago

bennetthardwick commented 5 years ago

Description

After the onPreExtractQueries step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, the isDate method (which is run on every string to check whether it's a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.

Steps to reproduce

  1. Clone gatsby-intense-benchmark
  2. Run:
    node --inspect-brk node_modules/.bin/gatsby develop
  3. Open chrome / chromium, navigate to chrome://inspect and start the debugger.
  4. When the build process finishes onPreExtractQueries, wait for a few minutes and then pause the debugger (chances are it will be in the right spot)
  5. Alternatively run the profiler for a few minutes and inspect the chart

The build will be stuck in this state for 30+ minutes (I haven't had a successful build).

Expected result

Build completes in a reasonable time.

Actual result

Build never completes.

Environment

System: OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver) CPU: (4) x64 Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz Shell: 5.4.2 - /usr/bin/zsh Binaries: Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm Languages: Python: 2.7.15 - /usr/bin/python npmPackages: gatsby: ^2.2.1 => 2.2.1 gatsby-image: ^2.0.34 => 2.0.34 gatsby-plugin-manifest: ^2.0.24 => 2.0.24 gatsby-plugin-offline: ^2.0.25 => 2.0.25 gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10 gatsby-plugin-sharp: ^2.0.29 => 2.0.29 gatsby-source-filesystem: ^2.0.27 => 2.0.27 gatsby-transformer-sharp: ^2.1.17 => 2.1.17 npmGlobalPackages: gatsby-dev-cli: 2.4.12 gatsby: 2.2.2

stefanprobst commented 5 years ago

Thanks for the report and for providing a testing repo!

We are indeed now checking every string if it is a date string -- we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well.

I'll test with the provided repo soon -- in the meantime: if it hangs after onPreExtractQueries that means in the schema update step. If that's also what you experience in your real project, you can try simply disabling schema updating here since it's really only to make the context fields available in the schema which you probably won't need (see this RFC) I'd also want to try switching momentjs for something like date-fns@2' (something like const isDate = value => isValid(parseISO(value)) and see if that would make a difference.

millette commented 5 years ago

Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.

wardpeet commented 5 years ago

PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out?

millette commented 5 years ago

Thanks @wardpeet I'm giving it a shot, hopefully with an answer in less than 2 hours :-)

UPDATE: Previous run took (from memory) 8000s to build schemas, with the patch it was only 600s.

Looking good!

KyleAMathews commented 5 years ago

What did schema creation take before?

me4502 commented 5 years ago

Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.

They mentioned in an earlier comment that it was taking 10-15 minutes. I'm going to try this on our codebase as we were taking 15 seconds before, but >30 minutes after.

stefanprobst commented 5 years ago

I also tried with swapping momentjs for date-fns@2 and the difference is enormous. The problem with how things are done currently is that we check 55 different ISO format strings with moment.

freiksenet commented 5 years ago

We will merge Ward's PR as a temporary optimization. Sadly date-fns parser is a bit too loose (like it allows dates with arbitrary trailing characters), so that would be a potential breaking change.

As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don't want to be inferred. We've added the "specifying the type" part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks.

millette commented 5 years ago

Closing (it's merged); further steps will require more specific issues.

danechitoaie commented 5 years ago

Just to throw an idea out there. What about if the source-plugins would be able to specify the data type when they create the nodes?

Something like name: gatsby.StringType(something.name), date: gatsby.DateType(something.date, "yyyy/MM/dd" /* format for how to be parsed */), etc. ?

(as a proposal for a future improvement)

wardpeet commented 5 years ago

@danechitoaie that's something we're going to move to.

@millette @bennetthardwick I did some more perf improvements #12722 could you guys give it a spin and report if something is wonky

millette commented 5 years ago

@wardpeet I didn't have a chance to try #12722 but on first sight it looks like a lot of code compared to first patch. I'm not used to "multirepos" so I have to patch the dist module manually. Also, my project should be redesigned somewhat since it's already pretty slow and doesn't make the best usage of gatsby, so it shouldn't be too significant to proper gatsby usage.

(reopening the issue since we're not done, it seems)

millette commented 5 years ago

@wardpeet I finally tried the new patch with gatsby 2.2.8.

gatsby 2.2.8:

success source and transform nodes — 132.075 s
success building schema — 566.949 s

gatsby 2.2.8 with new patch:

success source and transform nodes — 158.689 s
success building schema — 265.077 s
DSchau commented 5 years ago

@millette could you try with a gatsby@~2.1.0 version? The initial issue here was with the new Schema Customization API changes, so I'd be curious to see if it's still slower!

Thanks!

millette commented 5 years ago

@DSchau To be clear, you want me to test #12722 (which was merged 13 hours ago) against gatsby 2.1.0 ?

DSchau commented 5 years ago

@millette the central issue here was with a regression with 2.2.0.

Specifically, it seemed like the bottleneck was with Date inference.

That change was released in gatsby@2.2.10 so if we want to compare performance, we would ideally test this change:

millette commented 5 years ago

@DSchau Sorry in advance for the very long response, but here we go.

I've included results while also running mplayer which took some cpu. If you compare with runs without mplayer, I concluded that the 2.2 branch takes a bit more cpu (vs IO) since it was more impacted by mplayer running.

So there's a regression (540s vs 320s with mplayer; 485s vs 326s without mplayer). When 2.2 came out, the same build took almost 3 hours to complete so I'm not complaining :-)

I didn't patch anything in these tests.

v2.2.11 build#1 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.025 s
success onPreInit — 1.260 s
success delete html and css files from previous builds — 0.020 s
success initialize cache — 0.025 s
success copy gatsby files — 0.076 s
success onPreBootstrap — 0.050 s
success source and transform nodes — 169.772 s
success building schema — 283.182 s
success createPages — 0.147 s
success createPagesStatefully — 0.223 s
success onPreExtractQueries — 0.009 s
success update schema — 0.180 s
success extract queries from components — 0.725 s
success run graphql queries — 72.963 s — 28/28 0.38 queries/second
success write out page data — 0.030 s
success write out redirect data — 0.022 s
success onPostBootstrap — 0.008 s
info bootstrap finished - 540.315 s

v2.2.11 build#2 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.008 s
success onPreInit — 1.215 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.060 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 131.905 s
success building schema — 267.430 s
success createPages — 0.143 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.007 s
success update schema — 0.171 s
success extract queries from components — 0.709 s
success run graphql queries — 69.625 s — 28/28 0.40 queries/second
success write out page data — 0.020 s
success write out redirect data — 0.004 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 484.869 s

v2.2.11 build#3 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.978 s
success onPreInit — 1.269 s
success delete html and css files from previous builds — 0.159 s
success initialize cache — 0.024 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.033 s
success source and transform nodes — 0.890 s
success building schema — 270.609 s
success createPages — 0.184 s
success createPagesStatefully — 0.271 s
success onPreExtractQueries — 0.007 s
success update schema — 0.177 s
success extract queries from components — 0.707 s
success run graphql queries — 1.823 s — 16/16 8.79 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.003 s
success onPostBootstrap — 0.003 s
info bootstrap finished - 300.021 s

v2.2.11 build#4 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.195 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.056 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 134.602 s
success building schema — 267.401 s
success createPages — 0.144 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.008 s
success update schema — 0.182 s
success extract queries from components — 0.723 s
success run graphql queries — 71.003 s — 28/28 0.39 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 489.475 s

v2.2.11 build#5 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.042 s
success load plugins — 1.032 s
success onPreInit — 1.234 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.027 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.052 s
success source and transform nodes — 159.721 s
success building schema — 274.916 s
success createPages — 0.154 s
success createPagesStatefully — 0.216 s
success onPreExtractQueries — 0.007 s
success update schema — 0.186 s
success extract queries from components — 0.709 s
success run graphql queries — 73.223 s — 28/28 0.38 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.027 s
info bootstrap finished - 525.877 s

v2.1.39 build#6 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.050 s
success onPreInit — 1.242 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.026 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 160.464 s
success building schema — 20.783 s
success createPages — 0.322 s
success createPagesStatefully — 0.321 s
success onPreExtractQueries — 0.008 s
success update schema — 20.012 s
success extract queries from components — 0.746 s
success run graphql queries — 143.294 s — 28/28 0.20 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 361.719 s

v2.1.39 build#7 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.084 s
success onPreInit — 1.204 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.025 s
success copy gatsby files — 0.063 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 133.101 s
success building schema — 19.157 s
success createPages — 0.318 s
success createPagesStatefully — 0.326 s
success onPreExtractQueries — 0.007 s
success update schema — 21.001 s
success extract queries from components — 0.767 s
success run graphql queries — 140.068 s — 28/28 0.20 queries/second
success write out page data — 0.025 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.009 s
info bootstrap finished - 330.762 s

v2.1.39 build#8 (with mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.001 s
success onPreInit — 1.297 s
success delete html and css files from previous builds — 0.165 s
success initialize cache — 0.025 s
success copy gatsby files — 0.068 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 0.900 s
success building schema — 19.452 s
success createPages — 0.346 s
success createPagesStatefully — 0.382 s
success onPreExtractQueries — 0.008 s
success update schema — 20.832 s
success extract queries from components — 0.756 s
success run graphql queries — 1.960 s — 16/16 8.17 queries/second
success write out page data — 0.016 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.002 s
info bootstrap finished - 70.195 s

v2.1.39 build#9 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.239 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.051 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 134.509 s
success building schema — 19.830 s
success createPages — 0.322 s
success createPagesStatefully — 0.314 s
success onPreExtractQueries — 0.008 s
success update schema — 19.755 s
success extract queries from components — 0.761 s
success run graphql queries — 136.816 s — 28/28 0.20 queries/second
success write out page data — 0.017 s
success write out redirect data — 1.651 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 326.824 s

v2.1.39 build#10 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.000 s
success onPreInit — 1.245 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 133.040 s
success building schema — 20.355 s
success createPages — 0.313 s
success createPagesStatefully — 0.322 s
success onPreExtractQueries — 0.008 s
success update schema — 20.324 s
success extract queries from components — 0.819 s
success run graphql queries — 148.839 s — 28/28 0.19 queries/second
success write out page data — 0.704 s
success write out redirect data — 0.013 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 336.769 s

v2.1.39 build#11 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.997 s
success onPreInit — 1.281 s
success delete html and css files from previous builds — 0.161 s
success initialize cache — 0.024 s
success copy gatsby files — 0.067 s
success onPreBootstrap — 0.035 s
success source and transform nodes — 0.895 s
success building schema — 19.220 s
success createPages — 0.353 s
success createPagesStatefully — 0.371 s
success onPreExtractQueries — 0.008 s
success update schema — 20.691 s
success extract queries from components — 0.738 s
success run graphql queries — 1.912 s — 16/16 8.38 queries/second
success write out page data — 0.014 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.004 s
info bootstrap finished - 69.401 s
DSchau commented 5 years ago

@millette not slow at all! Thank you for doing this--we appreciate it!

millette commented 5 years ago

FYI, I'm not generating 60k pages, but I'm using a 60 MiB JSON file as a source. It's built with https://github.com/millette/gatsby-starter-location-github and the source data is generated with https://github.com/millette/ghraphql. I need to put some time into it soon since it's getting rather slow and bulky: http://dev.rollodeqc.com/en/ but that's all on me.

Thanks to all Gatsby contributors :-)

millette commented 5 years ago

Time to close this issue?

bennetthardwick commented 5 years ago

The changes have fixed my original issue at least. Thanks!