executablebooks / MyST-Parser

An extended commonmark compliant parser, with bridges to docutils/sphinx
https://myst-parser.readthedocs.io
MIT License
759 stars 196 forks source link

Fix smartquote logic #479

Open chrisjsewell opened 2 years ago

chrisjsewell commented 2 years ago

I'm surprised no one has noticed this before, but I've just realised there are some "conflicts" in how smartquotes are handled; between myst (via markdown-it), docutils and sphinx.


Within myst-parser, enabling the "smartquotes" extension (e.g. via myst_enable_extensions = ["smartquotes"], enables the markdown-it smartquotes rule: https://github.com/executablebooks/MyST-Parser/blob/2b3a93129623dc6d2ae1254edd6d35b0b1917726/myst_parser/main.py#L252-L254 You can see its tested behaviour here: https://raw.githubusercontent.com/executablebooks/markdown-it-py/master/tests/test_port/fixtures/smartquotes.md, i.e. this is applied before converting to docutils AST


With docutils, the MyST parser is inheriting the docutils RST parser: https://github.com/executablebooks/MyST-Parser/blob/2b3a93129623dc6d2ae1254edd6d35b0b1917726/myst_parser/docutils_.py#L166 Because of this, it also currently inherits its transforms, for which RST adds the SmartQuote transform: https://github.com/chrisjsewell/docutils/blob/8adab0660b2097b4f3c32cef7e5ff4cb3c72b084/docutils/docutils/parsers/rst/__init__.py#L177-L179

This does approximately the same thing, and is turned off by default, so for example:

$ echo "\"a\"" | rst2pseudoxml.py
<document source="<stdin>">
    <paragraph>
        "a"
$ echo "\"a\"" | rst2pseudoxml.py --smart-quotes=yes
<document source="<stdin>">
    <paragraph>
        “a”
$ echo "\"a\"" | myst-docutils-pseudoxml 
<document source="<stdin>">
    <paragraph>
        "a"
$ echo "\"a\"" | myst-docutils-pseudoxml --smart-quotes=yes 
<document source="<stdin>">
    <paragraph>
        “a”
$ echo "\"a\"" | myst-docutils-pseudoxml --myst-enable-extensions=smartquotes
<document source="<stdin>">
    <paragraph>
        “a”

we could perhaps remove this transform, i.e.:

class Parser(RstParser):

    def get_transforms(self):
        transforms = super().get_transforms()
        transforms.remove(SmartQuotes)
        return transforms

For sphinx though, removing the SmartQuote transform would not matter, because it already does this: https://github.com/sphinx-doc/sphinx/blob/eed0730b4ba3bd2fbd34f2d6ab555ba876c77717/sphinx/parsers.py#L72-L80, in order to replace it with SphinxSmartQuotes: https://github.com/sphinx-doc/sphinx/blob/eed0730b4ba3bd2fbd34f2d6ab555ba876c77717/sphinx/transforms/__init__.py#L323

By contrast with docutils, sphinx smartquotes are on by default 😬 : https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-smartquotes, so even if you don't enable the myst-parser smartquotes extension, quotes will still be converted, unless you set smartquotes = False

There are ways of disabling, for instance by setting document.settings.smart_quotes = False in the parser.


Not decided on what the best behaviour here should be yet, I imagine though there should be only one "source of truth" for enabling/disabling smart quotes:

  1. Disable the docutils/sphinx smartquote transforms, and rely only on the markdown-it rule
  2. Remove the use of the markdown-it rule, and have myst_enable_extensions = ["smartquotes"] dynamically change the docutils/sphinx config in the parser
  3. Remove the myst-parser smartquotes extension entirely and instruct users to use the docutils/sphinx config options

Perhaps the SmartQuote transform is more "optimised" for docutils/sphinx use, but the pro of the markdown-it rule, is that it is not tightly coupled to docutils, so you can use it in other "myst implementations", e.g.: https://github.com/executablebooks/myst-vs-code/blob/c8c96542171ebe6ccdd896cf8337a5ffffeb73d4/src/extension.ts#L1

also related #424

cc @choldgraf, @mmcky, etc and anyone else's thoughts welcome

mmcky commented 2 years ago

thanks @chrisjsewell -- If I understand correctly smartquotes provides typographically correct quotations marks right? Given markdown-it is a tight dependency and does the bulk of the parsing from myst:md to sphinx.ast I don't see a big downside to using it for smartquotes. Perhaps the key issues is if a user provides sphinx configuration in _config.py then this may cause confusing results as it is rendered ineffective?

I see your issue re: source of truth here.

72757373656c6c commented 6 months ago

According to docutils, smartquotes also contains:

> Straight quotes (" and ') into "curly" quote characters
> dashes (-- and ---) into en- and em-dash entities
> three consecutive dots (... or . . .) into an ellipsis entity.

This is also referenced in the Sphinx documentation for smartquotes.

The myst-parser documentation includes -- and --- as replacements.

I am using Sphinx+MyST and did not want replacements or smartquotes. I created a simple project to find a solution. The conf.py contains:

extensions = ["myst_parser"]
exclude_patterns = [ ".venv", ]
source_suffix = { ".md": "markdown", }
myst_enable_extensions = [ "colon_fence", ]
myst_disable_syntax = [ "smartquotes", ]
#myst_disable_syntax = [ "replacements", "smartquotes", ]

myst-parser FAQ - Disable Markdown syntax for the parser

But ', ", --, --- and ... were still converted. Adding replacements to myst_disable_syntax changed nothing. Also notice myst_enable_extensions does not contain smartquotes. It is enabled by default.

My work around is to double escape -- like \\-\\- according to stackoverflow - Display two dashes in rst file.

It would be helpful if myst_disable_syntax = [ "smartquotes", ] disabled the conversion of ', ", --, --- and ..., or smartquotes is disabled by default and are only enabled if smartquotes is added to myst_enable_extensions.

sphinx 7.3.7 \ myst-parser 3.0.1 \ docutils 0.21.2 markdown-it-py 3.0.0