MDX loader: use resource query to partially import a file

Josh-Cena commented 2 years ago

Have you read the Contributing Guidelines on issues?

[X] I have read the Contributing Guidelines on issues.

Description

Currently, our blog plugin is able to truncate a post based on the ?truncated=true resource query. E.g. "modulePath": "blog/2018-12-14-Happy-First-Birthday-Slash.md?truncated=true" Maybe we can implement something similar in the MDX loader itself.

Has this been requested on Canny?

no

Motivation

Allow partial imports instead of cutting a piece of Markdown out as partial

API design

We could allow several types of queries:

By file line number ?lines=3-9. Prone to breaking when editing files but easy to implement and easy to reason about
By heading ID ?section=caveats. Less flexible, but still very easy to reason about
By an arbitrary marker ?from=from&to=to, and in the imported Markdown file, there are corresponding  and  markers. This may lead to broken syntax in some cases, but should still be fun to try

Have you tried building it?

No

Self-service

[X] I'd be willing to contribute this feature to Docusaurus myself.

slorber commented 2 years ago

. Allow partial imports instead of cutting a piece of Markdown out as partial

Who wants that? Us or users?

Because mdx-loader is not documented as a public API surface atm

If we decide to migrate away from webpack/mdx-loader, will adding this feature makes the migration task harder?

Josh-Cena commented 2 years ago

Users. I was asked whether you can import Markdown but only certain parts, and I thought that's actually a fine idea.

It shouldn't make migration harder, as long as we still retain some kind of self-hosted Markdown loader. Resource queries are commonly adopted across bundlers / frameworks. We also have ?truncated=true as a precedent.

Users aren't actually touching MDX loader internals—all they know is that using a resource query allows them to import parts of a Markdown file.

slorber commented 2 years ago

We also have ?truncated=true as a precedent.

As far as I know, this is not documented and only used internally in a single place

          addRoute({
            path: permalink,
            component: blogTagsPostsComponent,
            exact: true,
            modules: {
              sidebar: aliasedSource(sidebarProp),
              items: items.map((postID) => {
                const metadata = blogItemsToMetadata[postID];
                return {
                  content: {
                    __import: true,
                    path: metadata.source,
                    query: {
                      truncated: true,
                    },
                  },
                };
              }),
              metadata: aliasedSource(tagsMetadataPath),
            },
          });

So you mean you want to expose this publicly to the users + add other capabilities, ie importing mdx files with queries

Should we provide a pre-defined set of resource queries, or should we enable users to bring their own resource query with a more flexible system?

Josh-Cena commented 2 years ago

Should we provide a pre-defined set of resource queries, or should we enable users to bring their own resource query with a more flexible system?

Let's support a few predefined ones first. A flexible system is quite hard to design.

As far as I know, this is not documented and only used internally in a single place

Yeah, that was just to illustrate the point that we use this trick ourselves, and if we migrate to something incompatible with this we have to take the pain ourselves anyways.

slorber commented 2 years ago

ok that seems reasonable

Daniel15 commented 5 months ago

Was this ever implemented? I'm trying to convince someone to switch from mdbook to Docusaurus, and they're using this feature in mdbook: https://rust-lang.github.io/mdBook/format/mdbook.html#including-portions-of-a-file

slorber commented 5 months ago

We haven't implemented it no.

I think it wouldn't be great to handle a feature such as ?lines=3-9. It's likely to break as content gets refactored. This would also require parsing the markdown as a multiline string and truncating it before passing it to the MDX compiler, leading to other troubles down the line (notably in case of compilation errors, line numbers wouldn't match the original content). I'd rather not support this.

I studied a bit the problem my suggestion: instead of making it part of Docusaurus core, maybe we could enable you to do this on your side instead?

We could expose the Webpack resource query as a Unified vfile.data.query attribute, letting you decide on your own what the query syntax should be, and how it affects the compilation process.

import MyPartial from "./_myPartial.mdx?start=myStart&end=myEnd"

Excluded

{/* myStart */}

Included 

{/* myEnd */}

const plugin: Plugin = function plugin(this: Processor): Transformer {
  return async (tree, file) => {
    const query = new URLSearchParams(file.data.query);
    const startMarker = query.get('start');
    const endMarker = query.get('end');
    if (!startMarker && !endMarker) {
      return;
    }

    const {visit} = await import('unist-util-visit');

    visit(tree, "mdxFlowExpression", (mdxFlowExpression) => {
      const isStartMarker = mdxFlowExpression.value === `"/* ${startMarker} */"`;
      const isEndMarker = mdxFlowExpression.value === `"/* ${endMarker} */"`;
      // Do something to filter the AST nodes you want to exclude here
    });
  };
};

The idea is that instead of building everything in Docusaurus core, we should enable the community to build such extensions. If one extension becomes highly popular, we could consider making it part of Docusaurus core. But I'd rather see userland extensions first 😄

Does it make sense?

facebook / docusaurus