executablebooks / mystmd

Command line tools for working with MyST Markdown.
https://mystmd.org/guide
MIT License
172 stars 50 forks source link

Table of content, patterns and nested directories #1323

Open nthiery opened 2 weeks ago

nthiery commented 2 weeks ago

Provide a way to control the hierarchy in the table of contents when using patterns, in particular to enable compatibility with JupyterBook.

The problem

Let's consider some deep hierarchy of documents

- A1/
   - A11/
       - A111.md
       - A112.md
- A2
    - A21/
        - ...

Use case 1: you want to build automatically the whole table of content, with the hierarchy of directories being reflected in a hierarchy in the toc. In that case, you expect:

toc:
   - pattern: */*/*.md

to result in a nested table of content such as:

> A1
    > A11
          A111.md
          A112.md
> A2
        ...

That's what MyST does.

Use case 2: you want to control the nesting of your table of content. E.g. because you want to choose your own titles for the directories. Or maybe have a nesting that is different from the directory hierarchy. So you expect the following configuration:

toc:
   - title: The introduction
     content:
      - title: bar
        content:
          - pattern: A1/A11/*.md
   - title: User guide
     ...

to produce:

> The Introduction
  > Bar
        A111.md
        A112.md
> User Guide
            ...

However, with the current MyST and unlike JupyterBook, the produced TOC is doubly nested:

> The Introduction
  > Bar
     > A1
        > A11
              A111.md
              A112.md
> User Guide
            ...

Proposal

I am not sure whether this should be qualified as a bug: automatic building the hierarchy in the toc can be useful as in Use Case 1. However we would need to support Use Case 2 as well, in particular for migrating from the current JupyterBook to MyST. Somehow, we would want the hierarchy to be built relatively to the current toc section, but I feel like matching a subdirectory with a toc folder is not necessarily 1 to 1, especially when having sections defined by just a title, not a file. Maybe we could have something like:

...
   - pattern: */*/*.md

would produces a flat table of content (as in JB), while:

...
   - pattern:
        - glob: foo/*/*.md
        - root: foo/ 

would produce a nested table of content w.r.t. the given root ???

Additional notes

Thank you so much for the new toc syntax in MyST. It's one more step in the migration, and it so much systematic than the toc syntax in the current JB.

@choldgraf : you may want to mention this issue in the tracking issue for the JB->MyST migration.

agoose77 commented 2 weeks ago

@rowanc1 @fwkoch my feeling here is that patterns should always be expanded flat, i.e. this is a bug report. I think there's value in having the TOC be an explicit source of truth for structure, i.e.

TOC Expanded TOC
```yaml toc: - file: foo.md - title: Children pattern: bar/baz/*.md ``` ```yaml toc: - file: foo.md - title: Children children: - file: bar/baz/a.md - file: bar/baz/b.md ```

Any thoughts?

fwkoch commented 1 week ago

Hmm, yeah - I did change the implementation from "always flat" to "follow MyST's implicit toc structure" for patterns.

My primary motivation was around the non-deterministic ordering of the flat files. They were just using the underlying system ordering (e.g. ordered by date modified or something). This felt a little bad.

My secondary motivation was, well... lack of familiarity with JB and intimate familiarity with MyST. So, to me, following the other implicit toc resolution made sense.

I totally understand the reasoning around flat pattern expansion, especially after reading through @nthiery's examples - though I do agree, there is maybe still a use case for the implicit structure I implemented.


Maybe the next steps here are:

  1. Re-implement flat expansion, but add deterministic ordering, e.g. index first, then alphabetic, etc. Maybe this even uses the existing implicit toc structure without the folder nesting?
  2. Possibly add an option to keep the folder structure, something like this:
    ...
    - pattern: */*/*.md
    folders: true

    and maybe even have the root option, like @nthiery mentioned...?

agoose77 commented 1 week ago

@fwkoch I like your suggestions. It's my feeling that there's value in the TOC being as deterministic as possible, i.e. to capture as much of the detail about the project as possible. Moreover, declaring the full structure gives us a lot more control over things like folder titles, etc.

So, my suggestion is that we use the implicit TOC resolution logic, but flatten it to a list of leaf nodes.