Should we parse directive options sooner?

fwkoch commented 2 years ago

Describe the problem/need and solution

Maybe this is too subtle to really matter, but it feels a little strange to me that directive options are not parsed upon initial creation of the directive token.

e.g. for this:

```{abc} foo bar
:a: one
:b: two

ABC directive

we get something like:

{ type: 'directive' info: 'abc' meta: { arg: 'foo bar' } content: ':a: one\n:b: two\n\nABC directive\n' ... }

And it's not until it is resolved to a known directive that the options are pulled out. Instead, I would expect the directive token to be something like:

{ type: 'directive' info: 'abc' meta: { arg: 'foo bar' options: { a: 'one' b: 'two' } } content: 'ABC directive\n' ... }


This separates the parsing (what exactly is the syntax of the directive) from the validation (are these options correct, etc?). This separation could _potentially_ make it easier to modify some of the existing behavior (e.g. if I want to raise a warning instead of an error when I encounter an unknown option - now I'm just filtering the `directive.meta.options` rather than rewriting the parsing code.).

### Guide for implementation

_No response_

### Tasks and updates

_No response_

welcome[bot] commented 2 years ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

chrisjsewell commented 2 years ago

Heya, This limitation comes from docutils: https://docutils.sourceforge.io/docs/ref/rst/directives.html, in that, it is impossible to resolve the directive "structure", until one knows what the directive is and hence its specification; number of required/optional arguments etc.

For example:

```{admonition} first line
:class: abc

body line


resolves to:

```json
{
  "type": "directive",
  "info": "admonition",
  "meta": {
    "arg": "first line",
    "options": {
      "class": "abc"
    }
  },
  "content": "body line\n"
}

whereas,

```{note} first line
:class: abc

body line


resolves (differently) to:

```json
{
  "type": "directive",
  "info": "note",
  "meta": {
    "arg": null,
    "options": {
      "class": "abc"
    }
  },
  "content": "first line\nbody line\n"
}

Since admontion has a required argument, but note does not.

The reason I believe they do this, is so that in RST one can write in a "terser" syntax, e.g.

.. note:: The note content
    :class: abc

rather than:

.. note::
    :class: abc

    The note content

But, for sure, it is certainly not ideal from a parsing perspective.

Then MyST "adopted" this, so that it was easier to get user to move from RST to MyST (https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#directives-a-block-level-extension-point)

chrisjsewell commented 2 years ago

This is not to say I am against changing this "non-declarative" syntax in the long run, but these are the historical reasons, and it will mean we would have to think of a migration strategy, for users/text that use the existing syntax

fwkoch commented 2 years ago

Hmm, yeah, I understand the historical context and I'm definitely not inclined to change that...

I think where I'm coming from is a separation between the "generic, parsed directive structure" and the final "specific, handled directive structure" (and this comes from thinking a bit about how we can define new directives independently, which would require a clean directive interface coming out of docutils).

So it would be something like:

.. note:: Some note content
    :class: abc

    More content!

-> Directive parsing pass

{
  "type": "directive",
  "info": "note",
  "meta": {
    "arg": "Some note content",
    "options": {
      "class": "abc"
    }
  },
  "content": "More content!\n"
}

-> Then that is passed to the "note" handler which says - "hey, arg should go into content" (and this is now specifically type note)

{
  "type": "note",
  "meta": {
    "arg": null,
    "options": {
      "class": "abc"
    }
  },
  "content": "Some note content\nMore content!\n"
}

Anyway, I'm not super fussed about this yet (and I'm also not too deep into the code, so I feel like I'm doing lots of hand waving...). I suspect it might come up again when we are starting to define directive plugins externally and we want to expose a nice interface so the plugin does not need to worry about parsing at all. Though maybe just defining the external directive correctly then calling directiveToData does all this in a perfectly clean way... 🤔

executablebooks / markdown-it-docutils

Should we parse directive options sooner? #30

Describe the problem/need and solution