executablebooks / MyST-Parser

An extended commonmark compliant parser, with bridges to docutils/sphinx
https://myst-parser.readthedocs.io
MIT License
747 stars 196 forks source link

Have a special syntax for page titles? #4

Open choldgraf opened 4 years ago

choldgraf commented 4 years ago

I would find it helpful to have a specific syntax for page titles so that you don't end up wasting your first markdown hash on it and have to start using ## from then on.

A few ways we could do it:

What do folks think about this idea?

chrisjsewell commented 4 years ago

Agree, currently the docutils transform docutils.transforms.frontmatter.DocTitle is applied, which converts an initial section title to a document title:

<document>
    <section names="top-level title">
        <title>
            Top-Level Title
        <paragraph>
            A paragraph.

To:

<document names="top-level title">
    <title>
        Top-Level Title
    <paragraph>
        A paragraph.
choldgraf commented 4 years ago

Just a note that I just realized setext = characters map on to H1, and setext - map to H2. So this might be complicated if we want to both treat == especially for titles, but also adhere to commonmark spec...

pauleveritt commented 4 years ago

(Sorry for a drive-by comment)...in a Sphinx thing I did in 2018, it was a pain to get the title from the doctree. For one, it happens a bit later in the Sphinx build phases than I wanted...I had already parsed the YAML frontmatter that I implemented but couldn't use it until I got the title.

If I had the option to get it from myst frontmatter and it winds up in the doctree, I'd be pleased.

FWIW, as someone who has done a lot of low-level Sphinx stuff, what you're doing is wonderful.

chrisjsewell commented 4 years ago

Thanks @pauleveritt 😄 @choldgraf where we are at with this issue, since I recall there has been some further thought on it downstream in myst-nb/jupyter-book?

choldgraf commented 4 years ago

I don't believe there has been specific new development on this topic, but here's a thought:

The challenge with having a special title syntax is that Markdown treats ATX and Setext the same way (roughly). E.g.,:

# Here's a header

Here's another header
=====================

Will both be parsed as H1 headers.

So, two thoughts:

  1. We could choose a character for a title that isn't supported in setext headers. I think the # symbol might be natural, since it's used for other headers, e.g.:

    Here's my title
    ###############
    
    # And here's a header
  2. We could use sandwiched headers as a special title syntax. e.g.:

    ===============
    Here's my title
    ===============
    
    # And here's a header
pauleveritt commented 4 years ago

A question about goals/non-goals. Do you want the same markdown document to be usable outside MyST/Sphinx? Meaning, someone could write a .md without any of the MyST/Sphinx special features, and later switch to something else if they are unhappy?

If so, you might want an approach which gracefully degrades to a heading. Which alas, I don't think exists. 😄

choldgraf commented 4 years ago

Haha yeah, I think that is a goal, though one we recognize is a balance between new functionality and the ability to degrade gracefully. So it all depends on the pros / cons I think.

For this case, I agree with you though as you say, I'm not sure there's anything that exists that would degrade nicely (unless we over-wrote the behavior of some commonmark syntax, which also doesn't seem to be ideal)

chrisjsewell commented 4 years ago

A front matter solution would be my recommendation

choldgraf commented 4 years ago

I actually quite like supporting a front-matter title - it seems to nicely go along with the principle of "explicit is better than implicit" (and right now we implicitly choose the first section header as the document title).

The only challenge I foresee is that editing notebook-level metadata in ipynb files is a PITA but I think it's something we could work around (and not strictly myst-parser's problem to solve)

choldgraf commented 4 years ago

I just looked at how nteract handles this, and they do add an author and a title notebook-level metadata to the notebook (they have UI to add this explicitly).

I think that we should just piggy-back off of that and use the same fields (title first, maybe authors in the future) and provide docs for how users can edit these on a per-notebook basis.

This UI in nteract is available when you click view -> notebook header:

image

and filling out the fields adds this metadata:

...
"authors": [
  {
    "name": "Chris Holdgraf"
  }
 ],
"description": "My notebook",
"title": "My title",

Also note, in rST there appears to be a title:: directive that sets the title for a page (https://docutils.sourceforge.io/docs/ref/rst/directives.html#metadata-document-title) not sure how it behaves in Sphinx though

pauleveritt commented 4 years ago

I'm in favor of title in frontmatter, but my needs are a good bit different.

choldgraf commented 4 years ago

@pauleveritt you mentioned that you had looked into this in Sphinx and that the title was determined later in the process than you'd like. Do you remember whether: if a <title> element was already in the doctree, Sphinx would skip promoting the first section header to <title>?

Then we could try to inject a <title> earlier in the process (e.g. from front-matter), and perhaps we could skip this step later on?

chrisjsewell commented 4 years ago

as I mentioned previously, you just need to interject before docutils.transforms.frontmatter.DocTitle

choldgraf commented 4 years ago

Ah, it wasn't clear to me from your earlier comment whether that transform will always be applied (aka, overwrite the doctitle that's there if it exists)

So then, I think what needs to happen here is to register a transform that precedes the sphinx title transform and adds a title node if there's the right metadata, yeah?

I think this is where sphinx adds the title https://github.com/sphinx-doc/sphinx/blob/af62fa61e6cbd88d0798963211e73e5ba0d55e6d/sphinx/environment/collectors/title.py#L34

pauleveritt commented 4 years ago

My case was unique. I had a system the embedded YAML into a frontmatter-style directive. The YAML created a pydantic instance for a schema. I wanted the title in the schema and the instance. But I couldn't get to the doctitle until after rst doc was parsed, which meant in a later phase (doctree-read). I had to stash away the partial-resource and finish constructing it in a later phase.

You have your parser, so you don't have this problem, so it's just a distraction. 😄

It does make me want to ask: any chance the frontmatter -> YAML instance could be pluggable? I'd love to have pydantic validate it, allowing me to bail out of a build very early.

chrisjsewell commented 4 years ago

So then, I think what needs to happen here is to register a transform that precedes the sphinx title transform and adds a title node if there's the right metadata, yeah?

Yes, plus encapsulate all content in a top-level section.

any chance the frontmatter -> YAML instance could be pluggable? I'd love to have pydantic validate it, allowing me to bail out of a build very early.

Including configurable validation for the front-matter wouldn't be that difficult. I think proper validation against a jsonschema would be better than using pydantic though. Feel free to open a separate issue.

cpitclaudel commented 1 year ago

I realize this is a pretty old issue, but I haven't really found a way to get myst to work for this. Concrete repro:

This compiles well using docutils --parser=myst --writer=xelatex:

# This is the title

This does not compile as a title:

---
author: Name
---

# This is the title

What's the "right" way to set the document's title and the author's name in a standalone myst document?