facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
54.19k stars 8.12k forks source link

RFC: plugin Markdown lifecycles – configureMarkdownLoader #6370

Open Josh-Cena opened 2 years ago

Josh-Cena commented 2 years ago

Have you read the Contributing Guidelines on issues?

Motivation

A more solid proposal derived from #4625. I propose two Markdown-related lifecycles for plugins: configureMarkdownLoader and getMarkdownComponents.

configureMarkdownLoader

Status quo

Currently, every content plugin uses @docusaurus/mdx-loader and passes it through configureWebpack. The docs and blog plugins also host their own Markdown loaders, in order to access the file path -> URL mapping and convert MD links to URL links.

However, this means each plugin's data is completely sandboxed, all the way from content loading to route creation. A plugin can't access routes created by another plugin at any stage in its lifespan. In addition, this means duplicated logic in every plugin. After #5999, the remark plugins will be moved to a global config option, which has to be hooked into in every plugin.

This also means that linkification happens before any Remark plugin gets loaded, and people can't remap Markdown paths, as requested in #4039, because when the beforeDefaultRemarkPlugins get loaded, the Markdown links don't exist any more.

Resolution

I propose that we move the MDX loader to the core, just like the CSS loader or JS loader. This effectively means we treat MD as a first-class citizen in our architecture. And then, each plugin will pass in loader options that are aggregated together. This solves several problems:

For example:

function pluginContentDocs() {
  return {
    configureMarkdown(content) {
      return {
        isIncluded: (filePath) => {
          return content.allDocSources.includes(filePath);
        },
        metadataPath: (filePath) => {
          const aliasedPath = aliasedSitePath(mdxPath, siteDir);
          return path.join(dataDir, `${docuHash(aliasedPath)}.json`);
        },
        // Allows global linkification
        getPermalink: (filePath) => {
          return allDocs.find(doc => doc.source === filePath).permalink;
        },
        isMDXPartial: (filePath) => {
          return createAbsoluteFilePathMatcher(options.exclude, contentDirs)(filePath);
        },
        // Stuff like remarkPlugins, staticDirs, etc. are no longer needed in plugin config
      };
    },
  };
}

The only problem is that some logic is ultimately plugin-specific. For example, how do we handle file-specific logic if the file queried is not handled by this plugin? My current solution is an isIncluded function. All configs are aggregated in a list, and for each file, the isIncluded callback is first called. If it returns true, then the rest of the callbacks are executed. All callbacks are called with filePath, but maybe we can also pass in fileContent if there's a use?

`getMarkdownComponents` is now unnecessary ## `getMarkdownComponents` ### Status quo Currently, the classic theme registers global Markdown components through the `MDXComponents`. This is problematic, because other themes can't register components on top of this unless we solve #4530. Moreover, the API surface is very implicit to the user. This is ultimately just a register, not a component, and can't be wrapped like other components. (Although you can still use object spreading) ### Resolution I propose a new `getMarkdownComponents` that's like `getThemePath`. It returns a path to a component map, in the same shape as the current `MDXComponents`. The core would merge all component registers and generate it as a temp file. Anything that uses the `MDXProvider` just needs to import from the generated folder.

After some more thoughts, I don't think we need the getMarkdownComponents API. Instead, we will solve #4530, and allow each theme to wrap the previous one's MDXComponents (which I'm also considering to be refactored into a useMDXGlobals hook).

Self-service

Josh-Cena commented 2 years ago

The shiki-twoslash preset uses a mutating siteConfig hack to inject the twoslash remark plugin into the content plugins. This lifecycle can make that a non-hack.

https://github.com/shikijs/twoslash/blob/4c6416670553b61074cd33be0f3bb0fed64d0ada/packages/docusaurus-preset-shiki-twoslash/index.js#L44-L52

Josh-Cena commented 2 years ago

I've closed #6261 because parsing titles and descriptions seems to sacrifice too much performance. Interlinking pages with Markdown paths is still a problem and we have several tests that currently have incorrect outputs. With this proposal, we will be able to move this logic to the MDX loader and thus be able to work on an already parsed MDAST instead of raw strings.