Closed bennetthardwick closed 5 years ago
Thanks for the report and for providing a testing repo!
We are indeed now checking every string if it is a date string -- we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well.
I'll test with the provided repo soon -- in the meantime: if it hangs after onPreExtractQueries
that means in the schema update step. If that's also what you experience in your real project, you can try simply disabling schema updating here since it's really only to make the context
fields available in the schema which you probably won't need (see this RFC)
I'd also want to try switching momentjs
for something like date-fns@2
' (something like const isDate = value => isValid(parseISO(value))
and see if that would make a difference.
Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.
PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out?
Thanks @wardpeet I'm giving it a shot, hopefully with an answer in less than 2 hours :-)
UPDATE: Previous run took (from memory) 8000s to build schemas, with the patch it was only 600s.
Looking good!
What did schema creation take before?
Me too! I am using a single 60 MiB JSON file for my data. Was taking about 10-15 minutes to build with gatsby < 2.2.0 but with the most recent, it's taking 2h30.
They mentioned in an earlier comment that it was taking 10-15 minutes. I'm going to try this on our codebase as we were taking 15 seconds before, but >30 minutes after.
I also tried with swapping momentjs
for date-fns@2
and the difference is enormous. The problem with how things are done currently is that we check 55 different ISO format strings with moment.
We will merge Ward's PR as a temporary optimization. Sadly date-fns
parser is a bit too loose (like it allows dates with arbitrary trailing characters), so that would be a potential breaking change.
As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don't want to be inferred. We've added the "specifying the type" part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks.
Closing (it's merged); further steps will require more specific issues.
Just to throw an idea out there. What about if the source-plugins would be able to specify the data type when they create the nodes?
Something like name: gatsby.StringType(something.name), date: gatsby.DateType(something.date, "yyyy/MM/dd" /* format for how to be parsed */)
, etc. ?
(as a proposal for a future improvement)
@danechitoaie that's something we're going to move to.
@millette @bennetthardwick I did some more perf improvements #12722 could you guys give it a spin and report if something is wonky
@wardpeet I didn't have a chance to try #12722 but on first sight it looks like a lot of code compared to first patch. I'm not used to "multirepos" so I have to patch the dist module manually. Also, my project should be redesigned somewhat since it's already pretty slow and doesn't make the best usage of gatsby, so it shouldn't be too significant to proper gatsby usage.
(reopening the issue since we're not done, it seems)
@wardpeet I finally tried the new patch with gatsby 2.2.8.
gatsby 2.2.8:
success source and transform nodes — 132.075 s
success building schema — 566.949 s
gatsby 2.2.8 with new patch:
success source and transform nodes — 158.689 s
success building schema — 265.077 s
@millette could you try with a gatsby@~2.1.0 version? The initial issue here was with the new Schema Customization API changes, so I'd be curious to see if it's still slower!
Thanks!
@DSchau To be clear, you want me to test #12722 (which was merged 13 hours ago) against gatsby 2.1.0 ?
@millette the central issue here was with a regression with 2.2.0.
Specifically, it seemed like the bottleneck was with Date inference.
That change was released in gatsby@2.2.10 so if we want to compare performance, we would ideally test this change:
@DSchau Sorry in advance for the very long response, but here we go.
I've included results while also running mplayer which took some cpu. If you compare with runs without mplayer, I concluded that the 2.2 branch takes a bit more cpu (vs IO) since it was more impacted by mplayer running.
So there's a regression (540s vs 320s with mplayer; 485s vs 326s without mplayer). When 2.2 came out, the same build took almost 3 hours to complete so I'm not complaining :-)
I didn't patch anything in these tests.
v2.2.11 build#1 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.025 s
success onPreInit — 1.260 s
success delete html and css files from previous builds — 0.020 s
success initialize cache — 0.025 s
success copy gatsby files — 0.076 s
success onPreBootstrap — 0.050 s
success source and transform nodes — 169.772 s
success building schema — 283.182 s
success createPages — 0.147 s
success createPagesStatefully — 0.223 s
success onPreExtractQueries — 0.009 s
success update schema — 0.180 s
success extract queries from components — 0.725 s
success run graphql queries — 72.963 s — 28/28 0.38 queries/second
success write out page data — 0.030 s
success write out redirect data — 0.022 s
success onPostBootstrap — 0.008 s
info bootstrap finished - 540.315 s
v2.2.11 build#2 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.008 s
success onPreInit — 1.215 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.060 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 131.905 s
success building schema — 267.430 s
success createPages — 0.143 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.007 s
success update schema — 0.171 s
success extract queries from components — 0.709 s
success run graphql queries — 69.625 s — 28/28 0.40 queries/second
success write out page data — 0.020 s
success write out redirect data — 0.004 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 484.869 s
v2.2.11 build#3 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.978 s
success onPreInit — 1.269 s
success delete html and css files from previous builds — 0.159 s
success initialize cache — 0.024 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.033 s
success source and transform nodes — 0.890 s
success building schema — 270.609 s
success createPages — 0.184 s
success createPagesStatefully — 0.271 s
success onPreExtractQueries — 0.007 s
success update schema — 0.177 s
success extract queries from components — 0.707 s
success run graphql queries — 1.823 s — 16/16 8.79 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.003 s
success onPostBootstrap — 0.003 s
info bootstrap finished - 300.021 s
v2.2.11 build#4 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.195 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.026 s
success copy gatsby files — 0.056 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 134.602 s
success building schema — 267.401 s
success createPages — 0.144 s
success createPagesStatefully — 0.215 s
success onPreExtractQueries — 0.008 s
success update schema — 0.182 s
success extract queries from components — 0.723 s
success run graphql queries — 71.003 s — 28/28 0.39 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 489.475 s
v2.2.11 build#5 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.042 s
success load plugins — 1.032 s
success onPreInit — 1.234 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.027 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.052 s
success source and transform nodes — 159.721 s
success building schema — 274.916 s
success createPages — 0.154 s
success createPagesStatefully — 0.216 s
success onPreExtractQueries — 0.007 s
success update schema — 0.186 s
success extract queries from components — 0.709 s
success run graphql queries — 73.223 s — 28/28 0.38 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.027 s
info bootstrap finished - 525.877 s
v2.1.39 build#6 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.050 s
success onPreInit — 1.242 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.026 s
success copy gatsby files — 0.055 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 160.464 s
success building schema — 20.783 s
success createPages — 0.322 s
success createPagesStatefully — 0.321 s
success onPreExtractQueries — 0.008 s
success update schema — 20.012 s
success extract queries from components — 0.746 s
success run graphql queries — 143.294 s — 28/28 0.20 queries/second
success write out page data — 0.010 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 361.719 s
v2.1.39 build#7 (with mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.084 s
success onPreInit — 1.204 s
success delete html and css files from previous builds — 0.019 s
success initialize cache — 0.025 s
success copy gatsby files — 0.063 s
success onPreBootstrap — 0.038 s
success source and transform nodes — 133.101 s
success building schema — 19.157 s
success createPages — 0.318 s
success createPagesStatefully — 0.326 s
success onPreExtractQueries — 0.007 s
success update schema — 21.001 s
success extract queries from components — 0.767 s
success run graphql queries — 140.068 s — 28/28 0.20 queries/second
success write out page data — 0.025 s
success write out redirect data — 0.005 s
success onPostBootstrap — 0.009 s
info bootstrap finished - 330.762 s
v2.1.39 build#8 (with mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.001 s
success onPreInit — 1.297 s
success delete html and css files from previous builds — 0.165 s
success initialize cache — 0.025 s
success copy gatsby files — 0.068 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 0.900 s
success building schema — 19.452 s
success createPages — 0.346 s
success createPagesStatefully — 0.382 s
success onPreExtractQueries — 0.008 s
success update schema — 20.832 s
success extract queries from components — 0.756 s
success run graphql queries — 1.960 s — 16/16 8.17 queries/second
success write out page data — 0.016 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.002 s
info bootstrap finished - 70.195 s
v2.1.39 build#9 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.040 s
success load plugins — 1.009 s
success onPreInit — 1.239 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.051 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 134.509 s
success building schema — 19.830 s
success createPages — 0.322 s
success createPagesStatefully — 0.314 s
success onPreExtractQueries — 0.008 s
success update schema — 19.755 s
success extract queries from components — 0.761 s
success run graphql queries — 136.816 s — 28/28 0.20 queries/second
success write out page data — 0.017 s
success write out redirect data — 1.651 s
success onPostBootstrap — 0.006 s
info bootstrap finished - 326.824 s
v2.1.39 build#10 (no mplayer, cache cleared)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 1.000 s
success onPreInit — 1.245 s
success delete html and css files from previous builds — 0.018 s
success initialize cache — 0.025 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.034 s
success source and transform nodes — 133.040 s
success building schema — 20.355 s
success createPages — 0.313 s
success createPagesStatefully — 0.322 s
success onPreExtractQueries — 0.008 s
success update schema — 20.324 s
success extract queries from components — 0.819 s
success run graphql queries — 148.839 s — 28/28 0.19 queries/second
success write out page data — 0.704 s
success write out redirect data — 0.013 s
success onPostBootstrap — 0.007 s
info bootstrap finished - 336.769 s
v2.1.39 build#11 (no mplayer, with cache)
$ gatsby build
success open and validate gatsby-configs — 0.041 s
success load plugins — 0.997 s
success onPreInit — 1.281 s
success delete html and css files from previous builds — 0.161 s
success initialize cache — 0.024 s
success copy gatsby files — 0.067 s
success onPreBootstrap — 0.035 s
success source and transform nodes — 0.895 s
success building schema — 19.220 s
success createPages — 0.353 s
success createPagesStatefully — 0.371 s
success onPreExtractQueries — 0.008 s
success update schema — 20.691 s
success extract queries from components — 0.738 s
success run graphql queries — 1.912 s — 16/16 8.38 queries/second
success write out page data — 0.014 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.004 s
info bootstrap finished - 69.401 s
@millette not slow at all! Thank you for doing this--we appreciate it!
FYI, I'm not generating 60k pages, but I'm using a 60 MiB JSON file as a source. It's built with https://github.com/millette/gatsby-starter-location-github and the source data is generated with https://github.com/millette/ghraphql. I need to put some time into it soon since it's getting rather slow and bulky: http://dev.rollodeqc.com/en/ but that's all on me.
Thanks to all Gatsby contributors :-)
Time to close this issue?
The changes have fixed my original issue at least. Thanks!
Description
After the
onPreExtractQueries
step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, theisDate
method (which is run on every string to check whether it's a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.Steps to reproduce
chrome://inspect
and start the debugger.onPreExtractQueries
, wait for a few minutes and then pause the debugger (chances are it will be in the right spot)The build will be stuck in this state for 30+ minutes (I haven't had a successful build).
Expected result
Build completes in a reasonable time.
Actual result
Build never completes.
Environment
System: OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver) CPU: (4) x64 Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz Shell: 5.4.2 - /usr/bin/zsh Binaries: Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm Languages: Python: 2.7.15 - /usr/bin/python npmPackages: gatsby: ^2.2.1 => 2.2.1 gatsby-image: ^2.0.34 => 2.0.34 gatsby-plugin-manifest: ^2.0.24 => 2.0.24 gatsby-plugin-offline: ^2.0.25 => 2.0.25 gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10 gatsby-plugin-sharp: ^2.0.29 => 2.0.29 gatsby-source-filesystem: ^2.0.27 => 2.0.27 gatsby-transformer-sharp: ^2.1.17 => 2.1.17 npmGlobalPackages: gatsby-dev-cli: 2.4.12 gatsby: 2.2.2