ChristopherBiscardi / gatsby-mdx

Gatsby+MDX β€’ Transformers, CMS UI Extensions, and Ecosystem Components for ambitious projects
https://gatsby-mdx.netlify.com/
715 stars 100 forks source link

Bug when using MDXRenderer + Large Markdown Files + PrismJS #411

Closed rwieruch closed 5 years ago

rwieruch commented 5 years ago

The last half a year I tinkered on a new blog written with Gatsby to get rid of my Hugo website generator. Since it is a technical blog, I used PrismJs and MDX. Now I am finally in the process of bringing all my content over, but hit a roadblock when I introduced my first blog post in Gatsby which is very long.

TLDR: Large markdown files with MDX (EDIT: and PrismJS) bring Gatsby to crash.


Problem

It all started with the following output on the command line during gatsby develop:

[BABEL] Note: The code generator has deoptimised the styling of undefined as it exceeds the max of 500KB.

It can be seen several times during the process of starting the website.

When I visit the website, I see the this output on the screen.

Screenshot 2019-07-02 at 10 27 35

If I open the developer tools console, I see this output multiple times:

Uncaught SyntaxError: Unexpected token export
    at new Function (<anonymous>)
    at mdx-renderer.js:31
    at mountMemo (react-dom.development.js:13460)
    at Object.useMemo (react-dom.development.js:13669)
    at useMemo (react.development.js:1492)
    at Object.wrappedHook [as useMemo] (react-hot-loader.development.js:2493)
    at MDXRenderer (mdx-renderer.js:15)
    at renderWithHooks (react-dom.development.js:12939)
    at mountIndeterminateComponent (react-dom.development.js:15021)
    at beginWork (react-dom.development.js:15626)
    at performUnitOfWork (react-dom.development.js:19313)
    at workLoop (react-dom.development.js:19353)
    at HTMLUnknownElement.callCallback (react-dom.development.js:150)
    at Object.invokeGuardedCallbackDev (react-dom.development.js:200)
    at invokeGuardedCallback (react-dom.development.js:257)
    at replayUnitOfWork (react-dom.development.js:18579)
    at renderRoot (react-dom.development.js:19469)
    at performWorkOnRoot (react-dom.development.js:20343)
    at performWork (react-dom.development.js:20255)
    at performSyncWork (react-dom.development.js:20229)
    at requestWork (react-dom.development.js:20098)
    at scheduleWork (react-dom.development.js:19912)
    at Object.enqueueSetState (react-dom.development.js:11170)
    at JSONStore../node_modules/react/cjs/react.development.js.Component.setState (react.development.js:335)
    at JSONStore._this.handleMittEvent (json-store.js:40)
    at mitt.es.js:58
    at Array.map (<anonymous>)
    at Object.emit (mitt.es.js:58)
    at r.<anonymous> (socketIo.js:56)
    at r.emit (index.js:83)
    at r.onevent (index.js:83)
    at r.onpacket (index.js:83)
    at r.<anonymous> (index.js:83)
    at r.emit (index.js:83)
    at r.ondecoded (index.js:83)
    at a.<anonymous> (index.js:83)
    at a.r.emit (index.js:83)
    at a.add (index.js:83)
    at r.ondata (index.js:83)
    at r.<anonymous> (index.js:83)
    at r.emit (index.js:83)
    at r.onPacket (index.js:83)
    at r.<anonymous> (index.js:83)
    at r.emit (index.js:83)
    at r.onPacket (index.js:83)
    at r.onData (index.js:83)
    at WebSocket.ws.onmessage (index.js:83)
(anonymous) @ mdx-renderer.js:31
mountMemo @ react-dom.development.js:13460
useMemo @ react-dom.development.js:13669
useMemo @ react.development.js:1492
wrappedHook @ react-hot-loader.development.js:2493
MDXRenderer @ mdx-renderer.js:15
renderWithHooks @ react-dom.development.js:12939
mountIndeterminateComponent @ react-dom.development.js:15021
beginWork @ react-dom.development.js:15626
performUnitOfWork @ react-dom.development.js:19313
workLoop @ react-dom.development.js:19353
callCallback @ react-dom.development.js:150
invokeGuardedCallbackDev @ react-dom.development.js:200
invokeGuardedCallback @ react-dom.development.js:257
replayUnitOfWork @ react-dom.development.js:18579
renderRoot @ react-dom.development.js:19469
performWorkOnRoot @ react-dom.development.js:20343
performWork @ react-dom.development.js:20255
performSyncWork @ react-dom.development.js:20229
requestWork @ react-dom.development.js:20098
scheduleWork @ react-dom.development.js:19912
enqueueSetState @ react-dom.development.js:11170
./node_modules/react/cjs/react.development.js.Component.setState @ react.development.js:335
JSONStore._this.handleMittEvent @ json-store.js:40
(anonymous) @ mitt.es.js:58
emit @ mitt.es.js:58
(anonymous) @ socketIo.js:56
r.emit @ index.js:83
r.onevent @ index.js:83
r.onpacket @ index.js:83
(anonymous) @ index.js:83
r.emit @ index.js:83
r.ondecoded @ index.js:83
(anonymous) @ index.js:83
r.emit @ index.js:83
a.add @ index.js:83
r.ondata @ index.js:83
(anonymous) @ index.js:83
r.emit @ index.js:83
r.onPacket @ index.js:83
(anonymous) @ index.js:83
r.emit @ index.js:83
r.onPacket @ index.js:83
r.onData @ index.js:83
ws.onmessage @ index.js:83
Show 15 more frames
10:12:31.223 

Reproduction

I tried to copy and paste the blog post's content into different starter packs until I narrowed it down to MDX. For instance, it works in gatsby-starter-blog. However, when I tried to use it in my gatsby-MDX-starter-blog, it crashes again; the same way like for my new Gatsby blog.

0) I started a branch for my gatsby-MDX-starter-blog project to have a minimal reproduction of the bug.

1) In order to exclude styled-components as troublemaker (see https://github.com/gatsbyjs/gatsby/issues/15205#issuecomment-507476373), I removed it in this commit on the branch.

2) Then I started to introduce the long blog post (commit), but not everything, to keep it still without the bug. It still works.

3) I introduced the remaining parts of the blog post (commit) which leads to the bug. Not sure whether there is a clear threshold so that it breaks for everyone the same, but it breaks after more than 1590 lines in markdown.


How to fix it?

1) I tried to use https://www.gatsbyjs.org/packages/gatsby-plugin-no-sourcemaps/ out of desperation, but it didn't help.

2) I set NODE_OPTIONS=--max_old_space_size=4096 but it didn't help.

3) I removed styled-components (see Reproduction 1), but it didn't help.

4) I tried to remove MDX, it helped, but I would want to keep it.

Any help is super much appreciated, because I have the feeling that 6 months of work for my new blog with Gatsby went down for nothing, since I struggle with the problem for the last 24 hours. Really appreciate all the things that are possible with MDX now. Hopefully we can find a fix for it. πŸ‘


My Dependencies

  "dependencies": {
    "@mdx-js/mdx": "^1.0.21",
    "@mdx-js/react": "^1.0.21",
    "core-js": "^2.5.7",
    "gatsby": "^2.12.0",
    "gatsby-image": "^2.2.3",
    "gatsby-link": "^2.2.0",
    "gatsby-mdx": "^0.6.3",
    "gatsby-plugin-catch-links": "^2.1.0",
    "gatsby-plugin-manifest": "^2.2.0",
    "gatsby-plugin-offline": "^2.2.0",
    "gatsby-plugin-react-helmet": "^3.1.0",
    "gatsby-plugin-sharp": "^2.2.1",
    "gatsby-plugin-styled-components": "^3.1.0",
    "gatsby-remark-copy-linked-files": "^2.1.0",
    "gatsby-remark-images": "^3.1.2",
    "gatsby-remark-prismjs": "^3.3.0",
    "gatsby-source-filesystem": "^2.1.1",
    "gatsby-transformer-remark": "^2.5.0",
    "gatsby-transformer-sharp": "^2.2.0",
    "prismjs": "^1.16.0",
    "react": "^16.8.6",
    "react-dom": "^16.8.6",
    "react-helmet": "~5.2.1",
    "react-youtube": "^7.9.0"
  },
rwieruch commented 5 years ago

What I tried next:

Project still runs! So one would assume it's not related to MDX.

Project shows same Babel output as seen above.

So I thought PrismJS would be the problem. But then I tried my long read blog post in https://github.com/gatsbyjs/gatsby-starter-blog and added PrismJs there again. No Babel output. I even made the blog post 4 times longer and it continued to work.

So the problem must be related to PrismJS which is used within MDX. If MDX is not there, PrismJS performs well.

@ChristopherBiscardi would it be possible to find the culprit within gatsby-mdx (see error output above) or is this related to MDX core? Any help would be super much appreciated!

rwieruch commented 5 years ago

Forget the last comment... It works in the MDX starter (except for the Babel 500kb output still showing up).

Somehow it only happens because I am using MDXRenderer in my project (see Reproduction from first comment) and in the MDX starter there are only children passed in the Layout. If I exclude PrismJs in my project, it works as well. So PrismJS is altering the code which gets passed to MDXRenderer somehow so that MDXRenderer doesn't like it.

johno commented 5 years ago

Thank you for the detailed bug report and updates! I'm gonna dive into this a bit today and see what I can dig up.

My initial hunch for the Babel warning is the HTML that gatsby-remark-prismjs injects ends up causing the transpiled JSX too be too large for Babel's readable output (which might be inevitable for any very large MDX file).

The crashing is much more concerning to me.

Any help is super much appreciated, because I have the feeling that 6 months of work for my new blog with Gatsby went down for nothing, since I struggle with the problem for the last 24 hours. Really appreciate all the things that are possible with MDX now. Hopefully we can find a fix for it.

We'll find a fix! ❀️

rwieruch commented 5 years ago

My initial hunch for the Babel warning is the HTML that gatsby-remark-prismjs injects ends up causing the transpiled JSX too be too large for Babel's readable output (which might be inevitable for any very large MDX file).

Yes. I think PrismJS blows it up in the end. If I output everything that goes through MDXRenderer, I get large pieces of [0] and it comes out as 802kb string [1].

If I remove several PrismJS line highlights, it's possible to render it again.

We'll find a fix! ❀️

❀️ I am there to help as well if I can do anything! Didn't dive too much into Babel's implementation details yet though πŸ˜… Thank you so much for digging into this. Didn't expect this to be a blocker, but perhaps it's good to have an edge case like this to work on. This will fix any "large markdown file"-issue for future generations πŸ˜„


[1]

Screenshot 2019-07-05 at 19 01 58

[0]

    }), ";"), "\n", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "import"), " React ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "from"), " ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token string"
    }), "'react'"), mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token punctuation"
    }), ";"), "\n", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "import"), " React ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "from"), " ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token string"
    }), "'react'"), mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token punctuation"
    }), ";"), "\n", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "import"), " React ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token keyword"
    }), "from"), " ", mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token string"
    }), "'react'"), mdx("span", _extends({
        parentName: "code"
    }, {
        "className": "token punctuation"
    }), ";")))));
johno commented 5 years ago

As a quick update I've been able to track down where things go wrong. It's indeed a bug in gatsby-plugin-mdx since we make some assumptions about the format of the transpiled JSX. When Babel deopts styling it turns out those assumptions no longer hold true πŸ€•.

So, I'm going to start work on a Babel plugin to address this issue. I should, hopefully, have something together soon!

Didn't expect this to be a blocker, but perhaps it's good to have an edge case like this to work on. This will fix any "large markdown file"-issue for future generations πŸ˜„

Yep! There are a few things all coming together to cause this edge case to happen, but now we can fix it for good. Thanks for your patience and understanding.

johno commented 5 years ago

I've got a PR open in Gatsby to fix the error. Though I'm also noticing that the actual code blocks are also missing proper whitespace formatting:

image

I'm wondering if this also has to do with babel deopting?


Something to also consider @rwieruch is to use react-prism-renderer directly to avoid some of these issue. Since quite a few of these posts are so long and code-block heavy you'd likely see a smaller bundle size since MDXProvider composition can be code split to save on large prism output in the document.

rwieruch commented 5 years ago

Again, wow! If there is anything I can do for you @johno just let me know. Your effort on this made my day and surely my next week, because I can migrate all the content to my Gatsby blog now πŸŽ‰

Thanks to @ChristopherBiscardi as well for this neat Gatsby to MDX bridge ❀️


Regarding your hint: I will give this example a shot in my code. Haven't seen this approach before! Super valuable. Do I understand correctly that I keep the gatsby-remark-prismjs + MDXRenderer component, but simply define my custom Highlight components for code?

johno commented 5 years ago

Again, wow! If there is anything I can do for you @johno just let me know.

Will do <3

Your effort on this made my day and surely my next week, because I can migrate all the content to my Gatsby blog now πŸŽ‰

Radical! If you ever get a chance I'd love to read a post on the good and bad of your migration (when you complete it) so we can improve upon it. 😸

Do I understand correctly that I keep the gatsby-remark-prismjs + MDXRenderer component, but simply define my custom Highlight components for code?

Using this approach you can remove the gatsby-remark-prismjs plugin entirely. Instead the new syntax highlighting component (using react-prism-renderer) will take over the rendering of all code blocks using React Context via MDXProvider and MDX's custom pragma.

It's a bit of a bizarre departure from traditional Markdown-style plugins but is more idiomatic for React and composition as a whole.


Best of luck, and please do reach out if you encounter any other questions/issues.

johno commented 5 years ago

After a patch in mdx-js/mdx#622 it looks like everything in your edge case is addressed @rwieruch! Thanks for your patience and the thorough report with reproduction. πŸŽ‰


Git diff of your reproduction repo


❯ gd
diff --git a/gatsby-config.js b/gatsby-config.js
index d17d788..bc662f0 100644
--- a/gatsby-config.js
+++ b/gatsby-config.js
@@ -21,7 +21,7 @@ module.exports = {
       },
     },
     {
-      resolve: `gatsby-mdx`,
+      resolve: `gatsby-plugin-mdx`,
       options: {
         extensions: ['.mdx', '.md'],
         gatsbyRemarkPlugins: [
diff --git a/gatsby-node.js b/gatsby-node.js
index d9908db..7582d8e 100644
--- a/gatsby-node.js
+++ b/gatsby-node.js
@@ -118,9 +118,6 @@ exports.createPages = ({ actions, graphql }) =>
               slug
               categories
             }
-            code {
-              scope
-            }
           }
         }
       }
diff --git a/package.json b/package.json
index 83a46f6..54f25d4 100644
--- a/package.json
+++ b/package.json
@@ -5,25 +5,25 @@
   "author": "Robin Wieruch <hello@rwieruch.com> (https://www.robinwieruch.de/)",
   "repository": "https://github.com/rwieruch/gatsby-mdx-starter-project",
   "dependencies": {
-    "@mdx-js/mdx": "^1.0.21",
-    "@mdx-js/react": "^1.0.21",
-    "core-js": "^2.5.7",
-    "gatsby": "^2.12.0",
-    "gatsby-image": "^2.2.3",
+    "@mdx-js/mdx": "^1.0.23",
+    "@mdx-js/react": "^1.0.23",
+    "core-js": "^3.1.4",
+    "gatsby": "^2.13.10",
+    "gatsby-image": "^2.2.4",
     "gatsby-link": "^2.2.0",
-    "gatsby-mdx": "^0.6.3",
     "gatsby-plugin-catch-links": "^2.1.0",
-    "gatsby-plugin-manifest": "^2.2.0",
-    "gatsby-plugin-offline": "^2.2.0",
+    "gatsby-plugin-manifest": "^2.2.1",
+    "gatsby-plugin-mdx": "^1.0.8",
+    "gatsby-plugin-offline": "^2.2.1",
     "gatsby-plugin-react-helmet": "^3.1.0",
-    "gatsby-plugin-sharp": "^2.2.1",
+    "gatsby-plugin-sharp": "^2.2.3",
     "gatsby-plugin-styled-components": "^3.1.0",
     "gatsby-remark-copy-linked-files": "^2.1.0",
-    "gatsby-remark-images": "^3.1.2",
-    "gatsby-remark-prismjs": "^3.3.0",
-    "gatsby-source-filesystem": "^2.1.1",
-    "gatsby-transformer-remark": "^2.5.0",
-    "gatsby-transformer-sharp": "^2.2.0",
+    "gatsby-remark-images": "^3.1.3",
+    "gatsby-remark-prismjs": "^3.3.1",
+    "gatsby-source-filesystem": "^2.1.2",
+    "gatsby-transformer-remark": "^2.6.1",
+    "gatsby-transformer-sharp": "^2.2.1",
     "prismjs": "^1.16.0",
     "react": "^16.8.6",
     "react-dom": "^16.8.6",
diff --git a/src/templates/post.js b/src/templates/post.js
index 435118d..8d664dc 100644
--- a/src/templates/post.js
+++ b/src/templates/post.js
@@ -1,7 +1,7 @@
 import React, { Fragment } from 'react';
 import { graphql } from 'gatsby';
 import Img from 'gatsby-image';
-import MDXRenderer from 'gatsby-mdx/mdx-renderer';
+import MDXRenderer from 'gatsby-plugin-mdx/mdx-renderer';

 import Layout from '../components/Layout';
 import Link from '../components/Link';
@@ -35,7 +35,7 @@ export default function Post({
         />
       )}

-      <MDXRenderer>{mdx.code.body}</MDXRenderer>
+      <MDXRenderer>{mdx.body}</MDXRenderer>

       <div>
         <CategoryList list={mdx.frontmatter.categories} />
@@ -79,9 +79,7 @@ export const pageQuery = graphql`
         categories
         keywords
       }
-      code {
-        body
-      }
+      body
     }
   }
 `;```
rwieruch commented 5 years ago

Perfect! Thank you so much @johno for investing your time here. I settled on your recommended solution now :)

Mrazator commented 5 years ago

Hey guys,

Is this problem really resolved? It seems that I have exactly the same one while using the latest possible packages. The difference might be that I am working with really long MDX files - thousands of lines and that the number of these files is around 2K (still growing) and I cannot get rid off this:

ERROR 

[BABEL] Note: The code generator has deoptimized the styling of undefined as it exceeds the max of 500KB.

Being displayed many, many, many times during the build. Is it possible that the merged solution does not count with so many (and big) files?

ChristopherBiscardi commented 5 years ago

@Mrazator the warning you're seeing doesn't substantially impact the operation of mdx or gatsby-mdx, it's just telling you that newlines and whitespace are being omitted. The original issue here was due to the crashing behavior which has since been fixed. Also, this repo is no longer the right place to file issues. Please file gatsby-plugin-mdx issues on the gatsby repo: https://github.com/gatsbyjs/gatsby

it's possible that mdx should enable compact to true always since users don't ever need to know about the babel processing and I think this is the only way to get rid of that warning.

janosh commented 5 years ago

@ChristopherBiscardi +1 for always setting compact to true if it removes that warning.