facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
55.69k stars 8.34k forks source link

`@docusuarus/mdx-loader` frontmatter removal prevents rehype vfile having correct line numbers #3935

Closed vjpr closed 11 months ago

vjpr commented 3 years ago

https://github.com/facebook/docusaurus/blob/ae45b11bbb32513a8cb44da163f096c00868a21d/packages/docusaurus-mdx-loader/src/index.js#L24-L58

I am writing an plugin that adds the source file + line number to every heading element.

Below you can see that the frontmatter is being parsed and then only the remaining markdown content is passed onwards.

const {data, content} = matter(fileString);
// ...
result = await mdx(content, options);

This means that in a rehype plugin the vfile.history[].position.start.line will not be correct, it not take into account the frontmatter lines.

I think a better approach would be to parse the frontmatter using a remark plugin so to retain the original line numbers. Like this: https://www.npmjs.com/package/remark-extract-frontmatter

This should also be corrected on https://mdxjs.com/guides/custom-loader#custom-loader page I think.

Workaround

I could use grey matter to parse the gray matter again in my plugin but that means a lot of duplicate file read operations.

Remark

In remark plugins, I am just using the first comment to apply config.

<!-- {tocMaxDepth: 1, tocMaxDepth: 4} -->

# foo
const plugin = (options = {}) => {
  const name = options.name || 'toc';

  const transformer = (node) => {

    const firstComment = node.children.find(node => node.type === 'comment')
    let config = {}
    if (firstComment) {
      config = JSON.parse(firstComment.value)
    }

    // stuff...

  }

}
JPeer264 commented 2 years ago

I agree that it is cumbersome not to have the frontmatter inside rehype/remark plugins.

We have a different use case, we only want to check certain frontmatter values inside a remark plugin, but have no chance of doing that, without reading the file again.

It would make sense to put fileString instead of content into the processing here. But I think it is also cumbersome if the logic is duplicated to remove frontmatter and the contenttitle yet again from the endresult as plugin.

Btw this is the updated source: https://github.com/facebook/docusaurus/blob/7ab2bd32342496f3f7373bc67d03a0da0eeffa40/packages/docusaurus-mdx-loader/src/loader.ts#L161-L165 and https://github.com/facebook/docusaurus/blob/7ab2bd32342496f3f7373bc67d03a0da0eeffa40/packages/docusaurus-mdx-loader/src/loader.ts#L209-L214

slorber commented 11 months ago

@vjpr as users are going to upgrade to Docusaurus v3 and MDX v2 soon they will likely encounter compilation errors and it's important to show correct line numbers if possible.

In https://github.com/facebook/docusaurus/pull/9386 I'm now trying to avoid altering the original content before passing it to MDX, using remark-frontmatter and another new internal plugin to handle the contentTitle export. It seems to fix that line number problem nicely.


@JPeer264 the front matter becomes available in the AST as raw string, with nodes of type yaml/toml.

remark-frontmatter does not parse that string, and we don't need this in Docusaurus core so if you want to do something with those new AST nodes you are free to handle those yourself, or use a community plugin:

I understand it's not super convenient to have to parse multiple times the same front matter, but it should unlock your use-case more efficiently now.

In the future we'll try to optimize all this, because currently we already parse files at multiple layers (mdx loader and plugin code) and it's not super efficient. I'm not sure yet how to do that properly though.

slorber commented 11 months ago

Nevermind, found a way to expose the frontMatter parsed data we already have without reparsing it.

const vfile = new VFile({
  value: content,
  path: filePath,
  data: {
    frontMatter
  }
});
function remarkPlugin() {
  return async (root, vfile) => {
    console.log(vfile.data.frontMatter)
  };
}