executablebooks / sphinx-external-toc

A sphinx extension that allows the site-map to be defined in a single YAML file
https://sphinx-external-toc.readthedocs.io
MIT License
33 stars 18 forks source link

Initial Feedback #1

Closed chrisjsewell closed 3 years ago

chrisjsewell commented 3 years ago

This implements https://github.com/executablebooks/meta/issues/292

Everything should be documented in the README (and what is left TODO)

The _toc.yml differs slightly from the one in https://jupyterbook.org/customize/toc.html, to streamline it a bit (a conversion function can be written for migration):

cc @choldgraf, @AakashGfude, @mmcky, @ericholscher, @pradyunsg, @hukkinj1 for comment

chrisjsewell commented 3 years ago

@ericholscher you might be able to help me with this: the RTD docs are building (https://readthedocs.org/projects/sphinx-external-toc/builds/) but the site says they do not exist: https://sphinx-external-toc.readthedocs.io 🤔

chrisjsewell commented 3 years ago

also cc @mgeier, since its an alternative to https://nbsphinx.readthedocs.io/en/0.8.2/subdir/toctree.html

mgeier commented 3 years ago

Thanks for cc-ing!

I like having the toctree links in the notebooks themselves, because then they can also be clicked in JupyterLab.

But I guess if somebody wants to use it, the external TOC extension would work just as well with Jupyter notebooks and nbsphinx as it would work with "normal" Sphinx source files, right?

chrisjsewell commented 3 years ago

But I guess if somebody wants to use it, the external TOC extension would work just as well with Jupyter notebooks and nbsphinx as it would work with "normal" Sphinx source files, right?

Yes indeed, the core extension is input format agnostic (modifying the parsed AST) 👍

AakashGfude commented 3 years ago

Thank you @chrisjsewell

  root:
    file: intro
    sections:
    - file: doc1
    - file: doc2

Is there a case when main won't be used in a toc? If it is used every time along with the master doc, then might be redundant?

chrisjsewell commented 3 years ago

yeh you could potentially have:

root: intro
sections:
- file: doc1
- file: doc2

which is also equivalent to:

root: intro
parts:
- sections:
  - file: doc1
  - file: doc2

but would just want to think if this "clutters" the top-level namespace or restricts addition of later keys

chrisjsewell commented 3 years ago

yeh you could potentially have ...

now changed and the initial comment edited

choldgraf commented 3 years ago

A couple quick thoughts:

chrisjsewell commented 3 years ago

Thanks for the comments @choldgraf

Is there a way that we can specify global defaults for the toc without it being in _toc.yml? E.g., if we were to use this in jupyter book, we'd probably want titlesonly / hidden to be on by default

Note hidden is already True by default, then for titlesonly; firstly can you clarify what the rationale is to have this True? The defaults key is already a global default, so it would be a bit odd to have multiple levels of global defaults. If we do decide for jupyter-book to keep this, then I would likely look to have a "jupyter-book special" code that changed this default to True (e.g. by checking an environmental variable)

On removing the chapter key - I don't have strong opinions about this one. Agreed that it's redundant w/ sections, though I think we originally kept it in order to give authors a mental model of their book's structure (even though chapters is basically just an alias for sections as you point out).

Exactly we should choose one, otherwise its misleading in that it implies a difference between the two, and is more for users to understand/remember.

I see the word main: used here, but the word root: used in the README, which is up-to-date? (I think that root is more technically correct, while main is more human-readable)

No its root, IMO main is too generic whereas root perfectly describes that it is the document at the root of the sitemap

chrisjsewell commented 3 years ago

the RTD docs are building (readthedocs.org/projects/sphinx-external-toc/builds) but the site says they do not exist: sphinx-external-toc.readthedocs.io 🤔

Ah I figured out this issue; it is because the root/master_doc is set as intro and so there is no index.html that RTD can find.

Should/has this been opened as a bug on sphinx, that if master_doc is not set to index then it should probably handle it?

Anyhow, I guess this relates to my comment here: https://github.com/executablebooks/jupyter-book/pull/1293/files#r610150939, such that this logic should be moved here, and if root is not index then an index.html should be created (given a sphinx html build) if it does not already exist (i.e. due to a single file build) and also a warning should be generated if a non-root document is called index (and also search and genindex from #16). The index page could either be a redirect to the real root, or simply a copy of the actual root file I guess.

chrisjsewell commented 3 years ago

Should/has this been opened as a bug on sphinx

Found an RTD issue for it: https://github.com/readthedocs/readthedocs.org/issues/1800

mmcky commented 3 years ago

thanks @chrisjsewell on this work. This is a great idea. I just had a few comments

removing the chapter key - I don't have strong opinions about this one. Agreed that it's redundant w/ sections, though I think we originally kept it in order to give authors a mental model of their book's structure (even though chapters is basically just an alias for sections as you point out). That said, I also see a benefit in keeping the syntax people need to remember relatively small.

Are chapter and section absolute equivalents here? If they are I see no downside to allowing either key as a lot of authors would prefer to write the toc in terms of chapters. A section is should be a sub component of a chapter.

Alternatively @chrisjsewell would this extension allow:

  1. a general syntax for setting up the toc, and
  2. a book style syntax that uses chapters, parts and sections.

We could then develop a mapping from the general to the book style of toc.

root

Just one comment re root. I completely agree it technically describes the purpose perfectly. However a lot of non-technical users won't know what root mean. If we want to be author friendly perhaps we allow synonyms for this key to include main?

hukkin commented 3 years ago

Regarding chapters/sections/main/root, I personally think there are downsides to aliases

chrisjsewell commented 3 years ago

Yes I'm completely against aliases as well, it just makes things more confusing. When I tried to read the documentation on the toc on jupyter-book, it was honestly super-confusing and not clear at all when to use sections or chapter/chapters.

A section is a sub component of a chapter.

See @mmcky even you don't understand the current structure lol, because this is definitely incorrect/outdated: there is no section or chapter keys; chapter was already deprecated to part: https://github.com/executablebooks/jupyter-book/blob/8be06d8b8920d4967737d7ffec91af691d7100b1/jupyter_book/toc.py#L67-L68, then sections or chapters (which are currently just aliases) are sub-components of part

However a lot of non-technical users won't know what root means.

If I am over-ruled by consensus fair enough, but I think it is very easy/clear to explain that root is the "root of a tree". In fact, I would put something like this in the documentation:

image

which I think would make it much clearer, and actually give users this mental model that you guys keep alluding to but never showing 😜

chrisjsewell commented 3 years ago

In fact, I would put something like this in the documentation

I could even add this toc.yml -> tree generation to the CLI (using https://graphviz.readthedocs.io as an optional dependency) if people think it would be helpful.

hukkin commented 3 years ago

Looking at @chrisjsewell excellent visualization, what comes to my mind is that if there's a naming change that would help me get the right mental model, it would be renaming parts as tocs by the way. Isn't that what parts means to Sphinx? A list of one or more table of contents.

EDIT: On second thought, I don't like this idea. Don't listen to me :smile:

chrisjsewell commented 3 years ago

On second thought, I don't like this idea

haha yeh I was just typing:

it would be renaming parts as tocs

I would be hesitant, because (a) it takes it further away from the current jupyter-book toc.yml and (b) maybe it fells a bit weird to have a _toc.yml that then has separate toc keys in it

mmcky commented 3 years ago

hey @chrisjsewell this _toc.yml uses the chapters key under part and is a good way to record a collection of chapters that form a part. It maps to how you think of a book.

I am all for a general structure for organising the toc, and I think this syntax is great for me to use. But I do think we need to spend a bit of time looking at it from the non technical end user perspective. I suspect your response will be if it is in the documentation then they should just follow up -- that is fine (and in part that is valid) -- but the general syntax takes a cohort of users out of their writing model which makes it harder to write a book.

FWIW - I am also against aliases unless there is a super clear 1:1 mapping and that isn't the case in these toc schema's.

I think the way forward would be to enable a domain based approach. We have the base schema (as you're proposing) and then allow different structures (that are probably more restrictive subsets -- such as for writing books, articles). That way we can take the book schema and ensure support in LaTeX for example. It is harder to support general structures that are more suitable for html then a more restrictive model that is more tightly tied to writing books. For example, on the LaTeX side it is harder to figure out if a nested section belongs to a chapter etc. (whereas it is a page in the html version). A domain based approach would also then enable preface:, chapter:, appendix:. etc. and this would then (under the hood) map to the more general structure using root etc.

chrisjsewell commented 3 years ago

one quick note

uses the chapter key under part

It doesn't use the chapter key it uses the chapters key, this is an important distinction, since there has previously been a chapter key which was different

chrisjsewell commented 3 years ago

Allow different structures (that are probably more restrictive subsets -- such as for writing books, articles).

Then you/we need to come up with some code/logic to deal with this.

Thinking about it now, as you allude to, the current toc.yml structure has essentially been written with the assumption that it will map to a LaTeX theme that converts:

index.rst

Index Title
===========

.. toctree::
   :caption: Caption

   doc1

+ doc1.rst

Doc 1 Title
===========

.. toctree::

   doc2

+ doc2.rst

Doc 2 Title
===========

to

\part{Caption}
\chapter{Doc 1 Title}
\section{Doc 2 Title}

(see https://github.com/executablebooks/jupyterbook-latex/blob/master/tests/roots/test-partsToc -> https://github.com/executablebooks/jupyterbook-latex/blob/master/tests/test_toc/test_toc.tex)

This is most definitely not always the case:

  1. Different LaTeX themes can map this differently, and in fact the user can change it with: https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-latex_toplevel_sectioning

If I use the default sphinx LaTeX theme, then it creates:

\addto\captionsenglish{\renewcommand{\contentsname}{Caption}}
\chapter{Doc 1 Title}
\section{Doc 2 Title}

or if I set latex_toplevel_sectioning = 'part' you get:

\addto\captionsenglish{\renewcommand{\contentsname}{Caption}}
\part{Doc 1 Title}
\chapter{Doc 2 Title}

(I assume @mmcky/@AakashGfude have added something in jupyterbook-latex to turn the toctree captions into parts?)

  1. It can change if you have multiple header levels e.g. in the index/root file, i.e. if index.rst was
Index Title
===========

Index Sub-Header
----------------

.. toctree::
   :caption: Caption

   doc1

and latex_toplevel_sectioning = 'part' you end up with:

\addto\captionsenglish{\renewcommand{\contentsname}{Caption}}
\part{Index Sub-Header}
\chapter{Doc 1 Title}
\section{Doc 2 Title}

So actually I would be in favour of moving away from using these key names for the general structure, which can be misleading, to something like:

Then OK you can think of having some kind of "theme specific mapping" (maybe specified under a theme/flavor key), e.g. for "jupyterbook"

Ideally here, you should also warn of things that would make this mapping incorrect e.g. if the root file had more than one heading level

A domain based approach would also then enable preface:, appendix:

appendix (and prefix) are already planned (see https://github.com/executablebooks/sphinx-external-toc#development-notes) these would be separate top-level keys in the toc.yml and nothing in the current structure precludes them

mmcky commented 3 years ago

I assume @mmcky/@AakashGfude have added something in jupyterbook-latex to turn the toctree captions into parts?

jupyterbook-latex uses transforms to modify the toc to shape it into a suitable format to get an output we would like from the the generic sphinx latex writer. The part option is turned on if part is detected in the toc and is used to generate the latex output. If a part is in the _toc.yml then the text associated with it will be used in the latex to name the part.

Also the part option through the latex config is documented in the jupyterbook docs but it didn't provide a completely correct mapping for the two structures offered by jupyterbook so we needed to alter the sphinx.ast through a transform:

  1. Files
  2. Parts/Chapters

Effectively any nested documents is a subtree to sphinx right?

So

  1. part -> chapters is a tree relationship (as you have mentioned), but also
  2. file -> sections is also a subtree relationship.

In jupyterbook-latex any children added as section are folded into the parent file for the pdf.


Then OK you can think of having some kind of "theme specific mapping"

@chrisjsewell I like the idea of having a core schema for the toc that directly relates to how sphinx works (in particular fordevelopers). I think that will really help with the cognitive issues when translating between how sphinx works (i.e. tree structures and flexible) and jupyter book table of contents layouts (easy to use).

But I think the user friendly layer that is more restrictive (and sits on top of the core schema) is essential to put some structure around the assumptions that are needed to be made to map to books etc. in the translation process.


When working on LaTeX we have been using a mixture of docutils / sphinx docs and test configurations such as this ebp-test-projectstructure repo to look at various configurations and how they related to the current structures:

  1. Files
  2. Parts/Chapters

Thanks @chrisjsewell -- I see a spec starting to form around this work.

chrisjsewell commented 3 years ago

thanks @mmcky

Effectively any nested documents is a subtree to sphinx right?

Well no, not if it does not contain a toctree. @mmcky, @AakashGfude note you should no longer be loading _toc.yml anywhere in your code (or even assuming it exists). You should always be using app.env.external_site_map, which will be an instance of: https://github.com/executablebooks/sphinx-external-toc/blob/07493ae516ec1e0962ad7b4c456aadcfa5392907/sphinx_external_toc/api.py#L78

This construct is also completely independent of the key names in the actual YAML, e.g. to check for "parts" (i.e. multiple toctrees in the root document) you could just do something like:

if len(env.external_site_map.root.subtrees) > 1:

and I would emphasize again that your interpretation of a part and a chapter in jupyterbook-latex are predicated on the assumption that there are no sub-headings in the root/master document (otherwise everything gets "pushed down"), so I would certainly check for and warn about additional section nodes in that document.

mmcky commented 3 years ago

hey @chrisjsewell I have come to realise this is primarily a sphinx extension so your aim here is to make whatever sphinx can represent as an external toc. I have been commenting primarily from the jupyterbook viewpoint which is a subset of what sphinx toctree objects can represent.

So we will need to either:

  1. Add an ability for domain support here that jupyter book can plug into, OR
  2. Add the ability for jupyterbook to convert to general yaml configurations (that are constructed here) from current _toc.yml files.

Is that right?

chrisjsewell commented 3 years ago

Is that right?

Kind of, there's really no difference to what jupyter-book already does, it's just the decision about key-names, which is essentially entirely superficial as it does not affect the underlying API/AST

Add the ability for jupyterbook to convert to general yaml configurations (that are constructed here) from current _toc.yml files.

again I would emphasise that jupyter-book, or its components (like jupyterbook-latex) should not even assume the existence of a _toc.yml any more. The parsing is all dealt with here and stored as env.external_site_map: SiteMap

chrisjsewell commented 3 years ago

Ok this is good to go in my eyes:

See the README (and commit history) for all details. For the "default" file format: parts -> subtrees and sections -> items, so that these keys do not have misleading connotation for LaTeX builds.

Then for @mmcky's request for "domains" there is the format key: https://github.com/executablebooks/sphinx-external-toc#using-different-key-mappings

You can see the jupyter-book migrations in the test files: from https://github.com/executablebooks/sphinx-external-toc/tree/main/tests/_jb_migrate_toc_files to https://github.com/executablebooks/sphinx-external-toc/tree/main/tests/test_tools

and also you can see the diffs in https://github.com/executablebooks/jupyter-book/pull/1293/files

and so basically for jupyter-book _toc.yml, they will look like either:

format: jb-article
root: index
sections:
- file: subfolder/index
  sections:
  - file: subfolder/asubpage1
- file: content1

or

format: jb-book
root: index
chapters:
- file: subfolder/index
  sections:
  - file: subfolder/asubpage1
- file: content1

or

format: jb-book
root: index
parts:
- caption: A part
  chapters:
  - file: subfolder/index
    sections:
    - file: subfolder/asubpage1
- caption: Another part
  chapters:
  - file: content1
mmcky commented 3 years ago

This is looking really neat @chrisjsewell - thanks for your work on support for domains.

I just had a few follow up comments.

  1. I have noticed you have used root: index but isn't the index (from sphinx perspective) defined by the _toc.yml for jupyter-book? The root file in the jupyterbook context has the same role as the current first file right? It is essentially preamble / frontmatter (kind off in the PDF context) or the front page in the HTML context.

  2. In the future do you think we could specify other domains such as such as a much simpler article representation that map into the more general syntax you have put together (with stricter assumptions), something like:

format: article
title: <text>
introduction: <file>
section: <file>
section: <file>
section: <file>
conclusion: <file>

and for books

format: book
title: <text>
frontmatter: <file>
toc: true
chapter: <file>
chapter: <file>
chapter: <file>
appendix: <file>

I think these domains will be super helpful in controlling what is (and isn't supported) across the various document output types. (i.e. pdf support is provided for book, article)

chrisjsewell commented 3 years ago

have noticed you have used root: index but isn't the index (from sphinx perspective) defined by the _toc.yml for jupyter-book?

I'm not quite sure I quite understand this sentence lol. Put simply root == root_doc (master_doc is actually deprecated); in sphinx you cannot build any documentation without one, so it has to be present in some form. This "entry point document" maps to HTML, where it is generally expected for an index.html to be present, and LaTeX, where you can specify% !TeX root = main.tex.

In the future do you think we could specify other domains such as ...

Well first you would need to define how such formats would be implemented by sphinx? Presumably you would have to auto-generate the root document in some way

choldgraf commented 3 years ago

Hey all - I gave the docs another read-through, and had a few ideas / comments. Will append to this list as I think of more. In general this is super close though, it looks really nice:

chrisjsewell commented 3 years ago

we should note that master_doc is becoming root_doc in a future Sphinx

yeh it was introduced as an alias in v4: https://github.com/sphinx-doc/sphinx/issues/9116

choldgraf commented 3 years ago

ok those issues are the main ones that I can think of - I opened them up as separate issues because this mega-thread was becoming difficult to follow. I welcome thoughts and feedback from folks 👍

mmcky commented 3 years ago

I'm not quite sure I quite understand this sentence lol. Put simply root == root_doc (master_doc is actually deprecated); in sphinx you cannot build any documentation without one, so it has to be present in some form.

Right. I just mean in the sphinx context this document typically contains the main toctree but that work is now taken care of by the _toc.yml or yml representation of the toc. So in effect the root_doc is now really the _toc.yml (in a way) as it defines the relationships between files. So the main function of the root_doc (for jupyter-book) is now to host the title and the frontmatter (in the pdf context) and the landing page content (in the html context)?

Well first you would need to define how such formats would be implemented by sphinx? Presumably you would have to auto-generate the root document in some way

Right -- that's what I am thinking (or replacing the root_doc requirement with the presence of the _toc.yml and generating a synthetic root_doc for sphinx internals)

chrisjsewell commented 3 years ago

So the main function of the root_doc (for jupyter-book) is now to host the title and the frontmatter (in the pdf context) and the landing page content (in the html context)? replacing the root_doc requirement with the presence of the _toc.yml and generating a synthetic root_doc for sphinx internals

exactly, so this is obviously functionality that does not currently exist in sphinx or jupyter-book, but could certainly be investigated and likely the code would reside in this package and be accomodated by specifying a new format that may not require the root key.

But yeh definitely a "version 2" consideration 😉

chrisjsewell commented 3 years ago

closing this, since the package is now out of development