facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
55.51k stars 8.32k forks source link

Ergonomic way to enhance ToC within Markdown: insert TOC slice, exclude headings #6201

Open ISSOtm opened 2 years ago

ISSOtm commented 2 years ago

Have you read the Contributing Guidelines on issues?

Description

Docusaurus currently allows manually altering (and even outright replacing) the ToC in Docs, as per https://github.com/facebook/docusaurus/issues/3915#issuecomment-896193142, but this is not documented.

This issue is as much a question on whether this is something we can expect to rely on, as a request to document it if the answer is "yes".

Has this been requested on Canny?

No response

Motivation

This is useful for documentation that is generated from other sources (in this case, a man page): while the HTML can be injected, the ToC does not follow suit (and I wouldn't expect it to. Or is that preferable?).

API design

Here is a doc which largely consists of externally-generated HTML, for which we additionally generate the ToC via a script.

# rgbds(5) — object file format documentation

import generated from '!!raw-loader!./rgbds.5.html';

<div className="manual-text" dangerouslySetInnerHTML={{ __html: generated }} />

export const toc = [
{
    "value": "DESCRIPTION",
    "id": "DESCRIPTION",
    "level": 2,
},
{
    "value": "FILE STRUCTURE",
    "id": "FILE_STRUCTURE",
    "level": 2,
},
{
    "value": "Header",
    "id": "Header",
    "level": 3,
},
{
    "value": "Source file info",
    "id": "Source_file_info",
    "level": 3,
},
{
    "value": "Symbols",
    "id": "Symbols",
    "level": 3,
},
{
    "value": "Sections",
    "id": "Sections",
    "level": 3,
},
{
    "value": "Assertions",
    "id": "Assertions",
    "level": 3,
},
{
    "value": "RPN EXPRESSIONS",
    "id": "RPN_EXPRESSIONS",
    "level": 3,
},
{
    "value": "SEE ALSO",
    "id": "SEE_ALSO",
    "level": 2,
},
{
    "value": "HISTORY",
    "id": "HISTORY",
    "level": 2,
},
];

Have you tried building it?

If the existing behavior is to be accepted as official, then nothing needs to be built; otherwise, what replacement API is deemed better will need to be discussed first.

Self-service

Josh-Cena commented 2 years ago

Hi, since this feature is already in place, is it essentially a documentation request?

The TOC shape is most likely stable, and it's kind of documented here: https://docusaurus.io/docs/next/markdown-features/inline-toc#custom-table-of-contents However, up till now we haven't figured out an ergonomic way to let you use it. For example, #3915 is very likely to be solved by letting you code a TOC yourself.

ISSOtm commented 2 years ago

This is first a question about whether modifying the ToC like in the example is intended to be part of the API, or is just exposed internals.

If the answer is yes, then this is a documentation request. If it's no, this is a request for an alternative.

This is not intended to supersede #3915, as is explained in the comment that I originally linked to, though it can be used as a stopgap in the meantime.

slorber commented 2 years ago

This is first a question about whether modifying the ToC like in the example is intended to be part of the API, or is just exposed internals.

This is internal implementation detail that is IMHO stable enough to use externally.

We could document it and make it officially a public API, but it probably requires some upfront thinking to be sure that the current API is the best way to solve your use-case

ISSOtm commented 2 years ago

It fits exactly my use case, since this document has the entirety of its content auto-generated.

However, if only a section were to be as such, then I'd want to be able to splice the ToC accordingly, more similarly to #3915. I think this could be arranged by having a file—autogenned.md—use this mechanism to manually specify the entirety of its own ToC, and then including autogenned.md performing the splice like requested in that PR.

Hopefully I didn't sound too confused?

slorber commented 2 years ago

If we add this to our doc, we should probably dogfood this on our own doc, explaining the constraints (like elements requiring a unique id for linking to work). We could showcase manual construction of a toc for generated HTML (like your case).

We should probably add this to this page (to be renamed as just "TOC"?): https://docusaurus.io/docs/markdown-features/inline-toc

Do you want to submit a PR?

I'm not sure what you mean by autogenned.md but I'd rather avoid showcasing the "import mdx partial" case of https://github.com/facebook/docusaurus/issues/3915, as it's not really a feature but an actual bug/limitation that we really want to solve.

Josh-Cena commented 2 years ago

@slorber What about we invent a "TOC enhancement" syntax ourselves and parse that in the TOC generating remark plugin?

I don't have a clear plan yet, but it would be something like:

import Content, {partialToc} from './_partial.md';

# A Markdown page

## Actual Heading 1

[[insert toc]]
- Inserted Heading 1
- Inserted Heading 2
  - Subheading 1

## Actual Heading 2

<Content />

[[import toc: partialToc]]

Which will generate a TOC:

export const toc = [
  {title: 'Actual Heading 1'},
  {title: 'Inserted Heading 1'},
  {title: 'Inserted Heading 2', children: [{title: 'Subheading 1'}]},
  {title: 'Actual Heading 2'},
  ...partialToc,
];

However, importing partials are very difficult. We either have to actually read the imported file and parse that as well, or we just let the user import that and we spread it into the final TOC. In the former case, we need to actually keep track of the imported component => MD file path mapping and read & parse more files, and means sacrificed performance; in the latter case, it means we can't be sure how the partial TOC should be spread. For example, for the example above, if ./_partial.md actually looks like:

### Imported Subheading 1

### Imported Subheading 2

Then the final TOC should have the two "Imported Subheading"'s as children of "Actual Heading 2" instead of spreading it to the root of TOC.

In hindsight it may be better if we have started off designing our server-side TOC structure as a flat list instead of a recursive tree. It's going to be rendered on client-side as a list anyways and we hardly take advantage of the tree structure. For now we recursively render each TOC level, but we could well use tocItems.map(i => <TOCItem item={i} />)...

slorber commented 2 years ago

@Josh-Cena I'm not sure it's really related to this issue, as here the content is in HTML and not a mdx partial import.


@slorber What about we invent a "TOC enhancement" syntax ourselves and parse that in the TOC generating remark plugin?

That's something I thought about but do we really want to invent a syntax that will only serve temporarily? I'm already not a fan of inventing a non-std md syntax 😅

The end goal is that the TOC works for imported files automatically, without asking the user to use any new fancy syntax.


However, importing partials are very difficult. We either have to actually read the imported file and parse that as well

I don't think we should do that, it duplicates work to each doc importing a partial and also adds more weight to each page, as the shared toc ends up being duplicated in each doc

We'd rather favor composition and have a remark plugin that compose the toc from current doc and partials without inlining the partial tocs into the document, like your spread example, but handled automatically by the remark plugin, not hand-written by the user

In hindsight it may be better if we have started off designing our server-side TOC structure as a flat list instead of a recursive tree

Good point, it will probably be needed to flatten that structure

So we are maybe not ready to make this "manual toc" feature an official public API 😅

Josh-Cena commented 2 years ago

I'm proposing this custom TOC syntax in place of handwriting the entire TOC structure which contains a lot of boilerplate. If we have that, we don't need to document the export toc = ... syntax at all. Also, currently it's all or nothing: you write everything or you let Remark do it all, there's no progressive enhancement. Such an API is not ready to be documented, and we can't extend this API to allow enhancement either, we have to do it from scratch.

The end goal is that the TOC works for imported files automatically, without asking the user to use any new fancy syntax.

As I said, it means we have to "understand" that <Content /> is an MDX partial, not a JSX component, which isn't easy. I've tried a while back to implement a Remark plugin and call Babel to parse the JSX for me, but overall it's a very painful thing to let Remark understand JSX. It also means we have to insert an extra {toc} named import back into the import Content from './_partial.mdx' statement, and that can be intractable.

Then there's the question of (a) user wanting to hide some headings from the TOC and (b) user wanting to insert extra headings that would otherwise not be visible to Remark. We would need a way to let users handwrite & insert part of the TOC, hence the proposal for the [[Inserted TOC]] and [[Imported TOC]] syntax. Even if [[Imported TOC]] can be heavylifted by us, [[Inserted TOC]] is there to stay.

Glad I made the flattened list point through :P Going to see what we can do about it

Josh-Cena commented 2 years ago

This will be triaged as a feature request and we will figure out an ergonomic way to tweak the TOC structure.

Just for reference, VuePress has this [[toc]] syntax: https://v2.vuepress.vuejs.org/guide/markdown.html#table-of-contents It's not anywhere close to the functionality we are discussing here, but the syntax is similar

ISSOtm commented 2 years ago

Fine by me, then. I'll keep relying on toc until then.

slorber commented 2 years ago

As I said, it means we have to "understand" that is an MDX partial, not a JSX component

We should look for the import statement extension, for sure it's a bit more complex but it should be achievable.

(a) user wanting to hide some headings from the TOC

It looks more appropriate to me to have a a syntax on the heading itself, similar to anchor links? I'm not sure how your proposal solves this use-case?

(b) user wanting to insert extra headings that would otherwise not be visible to Remark.

For example, user using React components with some headings inside md?

That's a quite specific use-case, but still seems like a reasonable thing to solve without requiring users to write the full toc manually

In practice, it's the use-case @ISSOtm exposed, but @ISSOtm may be satisfied by just having documentation explaining what we recommend, so it may not be so useful to implement something immediately, and doc might be good enough until it becomes more painful for a few users?

This gives us time to think more deeply about this problem. I'd also be happy to have a way to enhance/customize the automatically generated toc object, and not sure adding proprietary markdown syntax tags is the most flexible option. I'd rather use a real function:

# Title

blabla

export function toc(originalToc) {
    return [...originalToc,myExtraDocEntry];
}

Does it make sense?


Fine by me, then. I'll keep relying on toc until then.

Yes @ISSOtm, as you see this is subject to potential breaking changes as we may flatten the toc structure 🤪 so it's not yet a good time to document but it's a good enough workaround for now.

ISSOtm commented 2 years ago

A bit of a tricky UX problem is splicing some headings in the middle of the ToC. Figuring out the correct index is less trivial than how often it'd be desirable, imo.

Josh-Cena commented 2 years ago

Yes, the problem with export function toc(originalToc) is that it's too centralized and the user can't easily customize parts of it without going to great lengths to destructure a whole TOC level. Interspersing these "artificial TOCs" sounds much more ergonomic to me.

Josh-Cena commented 2 years ago

Looked a bit into this. Several random thoughts:

On the point of MDX heading transclusion. The solution would be like this: export toc = [...collectedTOC[0], ...importedTOC0, ...collectedTOC[1]] where collectedTOC are all the collected headings, split according to the locations of MDX partials. importedTOC0 is an automatically created symbol: import Partial from './_partial.mdx' => import Partial, {toc as importedTOC0} from './_partial.mdx'.

This solution is because we don't know what's actually in _partial.mdx unless we traverse that file as well, which is not worthwhile.

On the point of inserting extra anchors. This has three use-cases:

The actual syntax is open to discussion, but it would still basically be some artificial headings that will be recognized and removed by our remark plugin.

<!-- this admonition-like syntax encapsulates some artificial headings
that will be present in the TOC but removed from the content -->
:::toc

## Explanation {#explanation}

:::

Something like that...

This part doesn't require any refactors, because the remark plugin will see artificial headings the same as normal ones.

On the point of hiding headings away from the TOC. This is the tricky part. Note that {#anchor} is actually a widely-adopted Markdown feature, so up to this point Docusaurus hasn't done anything unique yet. Maybe we can have another syntax that's very similar to this anchor syntax?

## Hidden heading {!}

## Hidden heading 2 {!#heading}

This will still allow us to set the anchor ID, but the ! tells Docusaurus that this heading shouldn't be indexed.

In conclusion, we will:

Josh-Cena commented 2 years ago

The artificially inserted TOC will likely be useful for our API doc: https://docusaurus.io/docs/next/api/plugins/@docusaurus/plugin-content-docs

I envision something similar to Yarn's doc: https://yarnpkg.com/cli/workspaces/foreach#options where every line in the table can have its toc link

slorber commented 2 years ago

Yarn table is using h3

Couldn't we also use this?

Josh-Cena commented 2 years ago

I'm not sure if it's good practice, and MDX refuses to render headings within table rows (for good reason):

image

If we use <h3> then we lose the toc function anyways.

jasikpark commented 1 year ago

would like to use a component for my headings, while including them in the TOC.

currently approach is to change <ChangelogHeading version="0.1.8" date="2022-12-20"/> to

<ChangelogHeading date="2022-12-20">

## 0.1.8

</ChangelogHeading>

so that Docusaurus picks up the MDX heading to put in the TOC while still rendering as

<hgroup style={{ display: 'flex', flexWrap: 'wrap', alignItems: 'baseline', gap: '2em' }}>
      <h2>0.1.8</h2>
      <time dateTime="2022-12-20">2022-12-20</time>
</hgroup>
slorber commented 1 year ago

@jasikpark if you only want to customize the rendering of h2 headings of a specific doc, that looks better to me to simply use mdx components to provide a custom rendering logic.

We allow you to configure such docs globally here: https://docusaurus.io/docs/markdown-features/react#mdx-component-scope

You could create your own h2 component that renders the way you want, and is able to parse the h2 string on changelog docs.

IE you could just write this

## 0.1.8 - 2022-12-20

We don't allow (yet) to assign components on a per-doc basis. Until then you can add if/else in a global h2 component to detect when it makes sense to apply such custom rendering logic. Or you can try using the MDX Provider in your doc or theme directly?

import {MDXProvider} from '@mdx-js/react';
import H2Custom from '@site/src/components/H2Custom';

<MDXProvider components={{h2: H2Custom}}>

## 0.1.8 - 2022-12-20

text

## 0.1.9 - 2022-12-21

text

<MDXProvider/>

(if you don't want the date to appear in the TOC you can just put the version and create a mapping from version to date outside of the markdown)

jasikpark commented 1 year ago

interesting! thanks for the suggestions, i think i'll just stick with my current solution in that case 👍

shmuelisrl commented 1 year ago

I was trying to add headings that show up in the TOC, but I've encountered an issue. when you add a Heading in a bullet point it doesn't show up in the toc and i want the option to have it show up. I'm actually just trying to indent certain headings, and I'm doing this

<div class="nobullet">

# Inputs
* ## A
  * details
* ## B
  * details
* ## C
  * details
* ## D
  * details

</div>

I'd like to have the ability to set any word to be of not be in the TOC no matter where it is. like this:

word{is-in-TOC}(custom id)

* word{is-in-TOC}(custom id 2)

or
## word{is-not-in-TOC}
foot commented 1 year ago

Trying to automate the export const toc trick mentioned in this issue description, I very naively hacked together

// Use cheerio as docusaurus uses it
import * as cheerio from "cheerio";

// Read in raw html string and generate docusaurus toc format
export const genToc = function (apiRaw) {
  const $ = cheerio.load(apiRaw);
  return $("h2, h3, h4, h5, h6")
    .toArray()
    .map((header) => {
      const $header = $(header);

      const level = parseInt($header[0].name[1], 10);
      const value = $header.text();
      const id = $header.attr("id");
      return {
        level,
        value,
        id,
      };
    });
};

So one can then

import GeneratedAPI from './_api.mdx';

This is the generated api

<GeneratedAPI />

import { genToc } from '../_components/api-toc.js';
import apiRaw from "!!raw-loader!./_api.mdx";
export const toc = genToc(apiRaw);

However this approach seems to add 300kb to the page, which I would guess is mostly cheerio coming down the line. (_api.mdx is only 20kb)

I'm unfamiliar with SSR but could this be forced to be done at "SSG" time during a yarn build rather than in the browser? Would changing the above to use browser API's instead of cheerio still build okay?

slorber commented 1 year ago

@foot yes the TOC has to be computed server-side otherwise it would be first invisible and then "pop" once React hydrates. Here you include cherrio but also the markdown file as a source string (so each doc is now both a React component + a string).

This really should be done at build time, ideally as a remark plugin.

nullromo commented 1 year ago

Sorry if this is repeated information; it's a little hard to tell the progress here.

I have a use-case where I need my page to use <h3> elements instead of ### foo headers because the text in the header is generated programmatically at build time.

I am using Docusaurus to generate multiple documentation bundles that are product-specific. 90-95% of the documentation for these products is the same, so I use a single Docusaurus project to keep things as centralized and maintainable as possible. I set an environment variable depending on which version of the docs I want to build, and I access the variable through the useDocusaurusContext hook. This allows me to create content sections that contain different information based on the product. For example, a warning admonition might be present for one product and not the other.

There are some cases where I want to change the text of a heading based on the environment variable. For example "Getting Started with \<Product A>" vs. "Getting Started with \<Product B>". For this, I use a JSX component that grabs the environment variable and renders the correct text or content according to the variable. The only way to render a heading like this (as far as I know) is to put it inside of an <h3> element.

My problem is that the TOC never picks up any <h3> elements at all. This page of the Docusaurus docs and many other comments here/elsewhere online talk about wanting to exclude items from the TOC.

Non-Markdown headings will not show up in the TOC. This can be used to your advantage...

Unfortunately for me, the consequence of this "fix" is exactly what I don't want.

Is there any solution to this problem or workaround I can use? If not, do we know the status of any fixes for this problem? The only option I have now is to copy-paste all my markdown files and do away with the environment variable, which is a major maintainability headache.

nullromo commented 1 year ago

Adding some more detail to my previous comment now that I learned a little more about Remark/Rehype plugins.

By dumping out the mainNode object from the rehype-toc plugin, I found that the structure of the document looks weird to me. For normal headings, I get nodes that look like

{
    "type": "element",
    "tagName": "h2",
    "properties": { "id": "asdf" },
    "children": [
        {
            "type": "text",
            "value": "ASDF",
            "position": {
                "start": { "line": 111, "column": 4, "offset": 2968 },
                "end": { "line": 111, "column": 12, "offset": 2976 }
            }
        }
    ],
    "position": {
        "start": { "line": 111, "column": 1, "offset": 2965 },
        "end": { "line": 111, "column": 12, "offset": 2976 }
    }
},

but the component ones look like some kind of embedded JSX instead, so the ToC plugin is unable to detect the header elements within them

{
    "type": "jsx",
    "value": "<Paragraph\n    aContent={\n        <>\n            <hr />\n            <h2>ASDF 2</h2>\n            <EndpointTemplate\n                description= ........", // truncated for brevity
    "position": {
        "start": { "line": 124, "column": 1, "offset": 3280 },
        "end": { "line": 197, "column": 3, "offset": 5500 },
        "indent": [
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
        ]
    }
},

As you can see, once we get to a JSX component, it stops being a tree and just flattens everything into 1 node.

Somehow somewhere this JSX is expanded into an HTML tree, but I don't know how or where. Is there any way to run some kind of plugin or transformation after the JSX is expanded?

slorber commented 1 year ago

@nullromo we are not using remark-toc: we have our own custom Docusaurus toc remark plugin.

If the remark-toc package does not behave the way you want we can't do anything for you, and you should report it to that plugin author directly.


In any case, you can provide an explicit toc structure yourself on any doc that overrides the one we compute:

export const toc = [
  {
    value: "Label1",
    id: "anchor1",
    level: 2,
    children: [],
  },
  {
    value: "Label2",
    id: "anchor2",
    level: 2,
    children: [
      {
        value: "Label3",
        id: "anchor3",
        level: 3,
        children: [],
      },
    ],
  },
];

You can also take over how this structure is rendered with swizzle.

I agree that's probably not ideal and requires maintenance, but that works


If your goal is to only have placeholders in headings (and not dynamic headings that are present/absent conditionally), that can probably be implemented with a remark plugin:

## Getting Started with %%PRODUCT_NAME%%

We don't provide such plugins but you'll find useful information here and code examples here: https://github.com/facebook/docusaurus/issues/395

This code example is probably the best to get started: https://github.com/facebook/docusaurus/issues/395#issuecomment-1496701245 You can probably tweak it a bit so that it also substitutes variables in markdown headings

You could even imagine a plugin that add/remove sections conditionally:

<ProductAOnly>

## Getting Started with Product A

blabla

</ProductAOnly>

If your remark plugin runs before our custom toc plugin (tip: use beforeDefaultRemarkPlugins, see https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-content-docs#beforeDefaultRemarkPlugins), you have the ability to add/remove content to the MDX doc dynamically. We'll compute the toc based on what remains in the AST after those plugins have run.

So if your "before" remark plugin removes everything that's inside a JSX <ProductAOnly> block, then our TOC plugin won't see the headings that have been removed earlier.

Our TOC plugin is pretty simply, it mostly collects all headings found in the MD AST and add them to the export const toc. In pseudo-code:

 function tocPlugin(): Transformer {
  return async (ast) => {
    visit(ast, 'heading', (child: Heading) => {
      addHeadingToTocExport(child);
    });
  };
};
nullromo commented 1 year ago

@slorber Thanks for the advice. I was able to come up with a plugin just like how you described. Unfortunately, I still didn't find a way to actually parse through the JSX nodes within the AST, so I just have to use a regex to determine which nodes I want to remove. I'll post the code here (with names changed) for reference.

Plugin

const visit = require('unist-util-visit');

// plugin that removes certain components
const myPlugin = () => {
    // determine which nodes to filter out based on the product type
    const nodeRegex = (() => {
        switch (process.env.PRODUCT_TYPE) {
            case 'PRODUCT_A':
                return /(ProductBParagraph|ProductCParagraph)/;
            case 'PRODUCT_B':
                return /(ProductAParagraph|ProductCParagraph)/;
            case 'PRODUCT_C':
                return /(ProductAParagraph|ProductBParagraph)/;
            default:
                return /^$/;
        }
    })();
    return async (
        /** @type {import("unist").Node<import("unist").Data>} */ ast,
    ) => {
        // this variable will become true when we hit the opening tag for the
        // node to be removed, and it will become false when we hit the next
        // tag. This means that it will be true for all the nodes between the
        // opening and closing tags
        let removing = false;

        // traverse the tree
        visit(ast, {}, (child, index, parent) => {
            // remember if we removed the current node or not
            let removed = false;

            // removes the current node if we are in removing mode
            const removeNodeIfNeeded = () => {
                // check if in removing mode
                if (removing) {
                    // remove the node
                    //console.log('removing', child);
                    parent.children.splice(index, 1);
                    // remember that we removed the node
                    removed = true;
                }
            };

            // remove the current node if necessary
            removeNodeIfNeeded();

            // if the node is a JSX node, see if we hit the opening/closing tag
            if (child.type === 'jsx') {
                // if the value matches, then it's the right tag
                if (
                    // @ts-ignore
                    nodeRegex.test(child.value)
                ) {
                    // toggle removing mode
                    removing = !removing;
                    // if we just toggled on, remove this node
                    removeNodeIfNeeded();
                }
            }

            // if we removed the node, return SKIP, otherwise just return
            if (removed) {
                return [visit.SKIP, index];
            }
            return;
        });
    };
};

Components

export enum ProductType {
    PRODUCT_A = 'PRODUCT_A',
    PRODUCT_B = 'PRODUCT_B',
    PRODUCT_C = 'PRODUCT_C',
}

const useProductType = () => {
    return useDocusaurusContext().siteConfig.customFields
        .productType as ProductType;
};

const makeProductTypeParagraph = (
    productFilter: ProductType,
) => {
    return (props: React.PropsWithChildren) => {
        const productType = useProductType();
        if (productType === productTypeFilter) {
            return <>{props.children}</>;
        }
        return null;
    };
};

export const ProductAParagraph = makeProductTypeParagraph(ProductType.PRODUCT_A);
export const ProductBParagraph = makeProductTypeParagraph(ProductType.PRODUCT_B);
export const ProductCParagraph = makeProductTypeParagraph(ProductType.PRODUCT_C);

Markdown

<ProductAParagraph>

---

## Product A Heading 1

<Thing
    cool='yeah'
/>

---

## Product A Heading 2

<>
    <MyComponent
        nice='wow'
    />
</>

</ProductAParagraph>

<ProductBParagraph>

---

## Product B Heading 1

awesome

</ProductBParagraph>

I don't necessarily like the regex matching technique here because it kind of blindly removes stuff without really knowing what's going on. For example, if my approach is to remove everything between the opening and closing tags (<ProductAParagraph> and </ProductAParagraph>), then what happens when I try to use a self-closing tag (<ProductAParagraph />)? Things are going to get messed up.

So if there's any way that you know of to actually process JSX from inside a remark plugin, I'd love to hear about it.

In any case, thanks a lot for the help! Glad I now have a suitable workaround 🎉

slorber commented 1 year ago

So if there's any way that you know of to actually process JSX from inside a remark plugin, I'd love to hear about it.

First: I'd recommend implementing this in Docusaurus v3 (MDX is now at v2), currently in alpha.

That was just to give you a direction, I have not implemented this myself. If you want to build this properly you have to learn MDX / Unified and how all things work together: you can't skip reading the doc and investing some time.

You should inspect the produced AST tree and see which nodes you want to remove. Most likely the nodes will look like this:

{
      "type": "mdxJsxFlowElement",
      "name": "div",
      "attributes": [],
      "children": [/* content */]
}

What I would do is use a visitor to visit all mdxJsxFlowElement, and remove some of them conditionally. You don't need a regexp for that and self-closing tags are handled: there will be no children in such nodes.

Again this is just an idea and direction: you'll have to figure out the details yourself and learn how these things work.

jeluard commented 5 months ago

Exporting the global toc apparently doesn't work anymore with docusaurus v3. Is there an alternative to programmatically set the toc?

slorber commented 5 months ago

@jeluard while improving the TOC to support imported docs, I noticed a strange behavior.

Related discussion: https://github.com/facebook/docusaurus/pull/7530#discussion_r1458087876

Apparently, exporting toc would only work on docs that don't contain any heading.

CleanShot 2024-03-21 at 13 13 52

But as soon as you have headings (>= level 2), they get used in priority over your exported TOC.

CleanShot 2024-03-21 at 13 14 04


This didn't look good to me, so I fixed this behavior for Docusaurus v3.2/canary to always let you the ability to override the generated toc:

https://stackblitz.com/edit/github-zjz2fr?file=docs%2Fintro.mdx,package.json

CleanShot 2024-03-21 at 13 18 38

jeluard commented 5 months ago

@slorber I can confirm that it works as expected with canary docusaurus. Thanks!

rtrbt commented 5 months ago

Just in case someone else reads this: Some of the custom ToCs in this issue set their children like this:

export const toc = [{
        value: "Label1",
        id: "anchor1",
        level: 2,
        children: [],
    },{
        value: "Label2",
        id: "anchor2",
        level: 2,
        children: [{
            value: "Label3",
            id: "anchor3",
            level: 3,
            children: [],
        }],
    }
]

to create the same ToC as

## Label1 {#anchor1}

## Label2 {#anchor2}

### Label3 {#anchor3}

However (probably since https://github.com/facebook/docusaurus/pull/6729 ) setting children is not necessary anymore and the child entries don't show up in the ToC. This works:

export const toc = [{
        value: "Label1",
        id: "anchor1",
        level: 2,
    },{
        value: "Label2",
        id: "anchor2",
        level: 2,
    },{
        value: "Label3",
        id: "anchor3",
        level: 3,
    }
]
slorber commented 5 months ago

Yes I confirm in 2021 we had a nested structure, and now there's no children anymore, the TOC structure is flat.

ISSOtm commented 4 months ago

Thank you for the notice! I updated upstream, and edited the OP accordingly. This works much better :)

axmmisaka commented 4 months ago

I am wondering if docusaurus has supported customisation of ToC generation, as it appears on the docs website that it did not, but I have discovered a potential use case for us here: https://github.com/lf-lang/lf-lang.github.io/issues/238 We offer docs in 5 different target languages, each might have its unique ToC, and we achieve the language switching via docusaurus tabs. I am thinking of swizziing the tab switch and customise the ToC that is displayed, if possible.

slorber commented 4 months ago

@axmmisaka if you want to put headings inside tabs and expect the TOC to update according to the selected tab, there's another issue for that: https://github.com/facebook/docusaurus/issues/5343

axmmisaka commented 4 months ago

@axmmisaka if you want to put headings inside tabs and expect the TOC to update according to the selected tab, there's another issue for that: #5343

Thanks for the reply. I would assume that this issue has been designated as a wontfix and contributing to upstream would be the way to have it. My understanding is that part of the difficulty arise from the fact that we will need to tell remark to parse in a way which records whether or not the header is in a Tab. A workaround I could think of is to allow the author to supply a header and switch between them on-the-go. This would look ugly but would bypass the remark issue. I could contribute, for sure, once I am certain that I will not be pip-ed by my employer in the next cycle...... 🤣 but I am skeptical if such change could be merged upstream as we have a very special use-case.