Capturing plugin metadata for reproducibility

bollwyvl commented 4 years ago

While I love the idea of extensible markup in the Jupyter ecosystem, it's gonna get gross, and not reproducible real fast if content just silently looks bad if plugins are missing.

As this doesn't seem forthcoming in the CommonMark spec, this repo should probably demonstrate an approach to:

instrumenting and capturing which plugins were actually used per render
storing this metadata
- in the notebook, this seems like a top-level metadata field

metadata:
  jupyter-markup:
    plugins:
      - footnote
      - deflist
cells: []

in plaintext, presumably a comment syntax (gah!) could be added

<!-- jupyter-markup: footnote deflist -->

providing some feedback (status bar?) when authoring/reading if plugins are missing
- and how to get the missing plugins... this seems really hard to manage, especially considering...
demonstrating/testing headless rendering in nbconvert/jupyter-book vs in-browser content
- this would probably be a dependency of #11, rather than bundled

agoose77 commented 4 years ago

I agree that it would be very hard to error when plugins are missing, as incorrect Vs new syntax are the same problem from alternative perspectives. I think it might be something to embed in the notebook, because it would become very tedious to have to declare for every MD cell. The tricky thing is that the same logic could apply to having notebooks declare which lab extensions they expect. I'm going to give this more thought myself, because there will be some reproducibility from the fact that we can define the python packages that add the plugins as dependencies. Still, silent errors...

agoose77 commented 4 years ago

EDIT: moved from the parent PR

I've been having some more thoughts about this @bollwyvl

Are you thinking that we store the IDs MarkdownIt plugins themselves, or of the JupyterLab Markdown Plugin Extensions? I was initially thinking of some kind of system where the MarkdownIt extension checks whether it has the providers for requested MarkdownIt plugins, but this would tie the metadata source (e.g. notebook JSON or markdown header) to the implementation details (that we use MarkdownIt).

I was thinking that it would be better for the metadata source to request JLab extensions. If this were the case, at what point is it not better to generalise this approach and have notebooks be able to suggest which frontend extensions they expect. This wouldn't be a hard requirement, because people use all kinds of notebook frontends, and there are often different extensions to implement the same functionality. But, at least in JupyterLab this might be quite useful, e.g. a notebook could state

I need the following extensions:

jupyterlab-diagrams
ipympl

But, then we arrive at the point for JLab 3 where these extensions are already managed by the Python dependency management, and the whole thing becomes a lot simpler. I know that you can still load extensions with npm etc., but from the "good notebook workflow" perspective, the standard approach is to use a requirements.txt or conda environment.yml to capture Python dependencies; there is a precedent for reproducibility.

TL;DR, is it sufficient to implicitly capture md-it plugin requirements using the Python dependency manager?

agoose77 commented 3 years ago

I've thought more about this - with LSP integration, we can't rely on out-of-document information like pyproject.toml, and indeed, if those extensions are disabled, it would not be reflected in the LSP support. So, I think a per-doc metadata entry is needed.

See https://github.com/agoose77/jupyterlab-markup/issues/40#issuecomment-910795307

bollwyvl commented 3 years ago

Welp, I'm imagining we'll end up needed a dedicated, bespoke language server. It certainly could look at a pyproject, or jupyter_config. But that's semi-irrelevant, as a notebook or plain text document needs to stand alone.

agoose77 commented 3 years ago

Yes, I agree. The LS would need to understand the idea of configurable syntax options. The first step is to get this metadata into our documents, and from there we can look at the LSP side of things. I don't have time right now to work on this, but I'll pop back with more thoughts!

agoose77 commented 3 years ago

Alright, I opened a Discourse discussion on this topic here.

The TL;DR is that I wonder whether the notebook should store the MIME-type of the cell for non-code cells, so that information like the plugins we're using (but also, the markdown renderer for "normal" notebooks) can be read by clients of the notebook.

Although this benefits us as extension authors, it would also fix a hole in the notebook spec that has been apparent for some time.

agoose77 / jupyterlab-markup

Capturing plugin metadata for reproducibility #13