Open chrisjsewell opened 3 years ago
Note, this is not to say we need to immediately implement the full functionality of sphinx, which will be no mean feat. But, where possible, we should put thought in to the initial steps, such that we do not have to completely re-design everything, once it (I feel inevitably) gets more complex.
Note 2, we should also be cognisant of the use cases:
Here we can "get away" with not having to fully render every role/directive etc, or deal with any multi-page issues (e.g. cross-page referencing). An important thing though, is that the parse is sufficiently fast, for realtime re-rendering.
Another use case I would like to work towards is an LSP. Here we might want to parse all documents in the background, and maintain a "database" of references/targets and their position in the document (e.g. for "jump to definition" and reference auto-complete features)
Actually rendering a full book
The text below is copied from https://github.com/executablebooks/markdown-it-myst/pull/31
TL;DR docutils/sphinx, I feel can be a little overly complex and has some shortcomings, BUT many aspects are there for a reason and we should learn from it and ensure the design can accommodate/be extensible for the necessary complexity from the outset.
I would like to eventually create some UML/SysML diagrams of the design, for ourselves and others to understand
In this document we outline the general design decisions for a generic MyST parser, and then how this applies to the Javascript parser we have built here. Note, this may eventually be moved to a "top-level" documentation of MyST.
Currently, the primary implementation of a MyST parser is written as a Sphinx extension (in Python); using markdown-it to initially parse the source text to a "token stream" (a list of syntax tokens, encapsulating the whole document and its content), then we convert this token stream to a docutils AST tree (in the
myst-parser
extension), which Sphinx then uses to convert to the desired output format (e.g. HTML or LaTeX). Naturally this design is tightly coupled to Sphinx, but (a) in Javascript we do not have an implementation of Sphinx, and (b) we would like to move away from being reliant on any one "technology" for parsing, and instead outline a more generic "standard" for MyST parsing, which anyone could in principle implement. What we don't want to do though is end up unknowingly reimplementing a worse version of Sphinx. In the next section then we discuss the Sphinx design, the reasons behind it, and some of its technical limitations.Analysis of the Sphinx design
The sphinx design is outlined in more detail at https://www.sphinx-doc.org/en/master/extdev/appapi.html#sphinx-core-events, but the basic stages can be described as:
BuildEnvironment
), for later fast lookup.Builder
Another core concept is that of the logger, which logs specific information/warnings to the console, but also can be configured to fail the build (i.e. produce a non-zero exit code) if any warnings are encountered. In this way the build is robust to errors (we don't want the whole build failing because of one syntax error), but allows us to programmatically tell if there any issues with our documentation (e.g. when we run CI tests).
As an addendum to the above design, we can also consider the steps to re-build the outputs, given an initial build has already been performed.
Lastly we should consider Sphinx's plugin system, in the form of extensions which can:
config-inited
events) e.g. to apply additional validationenv-get-outdated
events)Although a lot of this system is well designed, and we will certainly need to include most if not all of these steps, there a number of design issues that could be improved:
pformat
method which converts it into a "pseudo-XML" string, although this does not actually contain all the information about the AST.