executablebooks / MyST-Parser

An extended commonmark compliant parser, with bridges to docutils/sphinx
https://myst-parser.readthedocs.io
MIT License
708 stars 189 forks source link

Can we, please, have Pandoc-style non-breaking spaces? #805

Open stroobandt opened 10 months ago

stroobandt commented 10 months ago

Describe the feature you'd like to request

For MyST to enter into my academic writing workflow, I really would like to see a MyST Markdown extension for quick-to-type non-breaking spaces.

Rationale: Scientific and technical documents are brimming with in-text values of quantities and their respective units. According to The International System of Units §5.4.3:

The numerical value always precedes the unit and a space is always used to separate the unit from the number. Thus the value of the quantity is the product of the number and the unit. The space between the number and the unit is regarded as a multiplication sign (just as a space between units implies multiplication).

Hence, line-breaking/word-wrapping between the value of a quantity and its unit is not considered best practice in typography.

A similar argument holds for the thin space between digit grouping, e.g. between thousands in numbers with more than 4 digits. See for example Astronomy & Astrophysics Typography: general typing rules.

Describe the solution you'd like

I would love to see a Markdown extension similar to the one available in Pandoc:

A backslash-escaped space is parsed as a nonbreaking space. In TeX output, it will appear as ~. In HTML and XML output, it will appear as a literal unicode nonbreaking space character (note that it will thus actually look “invisible” in the generated HTML source […]

Hence, typing \ in MyST Markdown would be interpreted as a non-breaking space A0 in UTF-8,   in HTML and ~ in LaTeX. Similarly, typing \, in MyST Markdown should be interpreted as a narrow non-breaking space 202F in UTF-8, &nnbsp; in HTML and \, in LaTeX.

Describe alternatives you've considered

I am currently helping myself in MyST Markdown with  , but typing six characters for one simple non-breaking space is extremely cumbersome. Moreover, it renders the MyST Markdown source less readable, whereas Markdown was intended for readability.

welcome[bot] commented 10 months ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

agoose77 commented 10 months ago

Hi @stroobandt!

Over at https://github.com/executablebooks/mystmd there's work ongoing to support the siunitx syntax via a {si} unit role, e.g. {si}5 <kg.m/s^2>`. Would this meet your needs?

If so, we ought to consider how to bring this to MyST Parser somehow.

stroobandt commented 10 months ago

@agoose77 That is certainly a very welcome development. However, there are more instances besides units where quick-to-type non-breaking spaces are desired.

For example, inside a narrow table cell, a non-breaking space can be useful to prevent an unfortunate word wrap. Another example, might be the space between proper names and I am undoubtedly still missing many more use cases where non-breaking spaces are useful.

In conclusion, I still would like to push for a \ MyST Markdown extension. Moreover, this would also ease the migration route for existing Pandoc users like myself.

dbitouze commented 10 months ago

I am currently helping myself in MyST Markdown with &nbsp;, but typing six characters for one simple non-breaking space is extremely cumbersome. Moreover, it renders the MyST Markdown source less readable, whereas Markdown was intended for readability.

AFAICS, at least for an HTML output, it is enough to directly type in the MyST Markdown source the U202F space (on my French AZERTY keyboard on GNU/Linux: AltGr+Shift+SPC).

stroobandt commented 9 months ago

AFAICS, at least for an HTML output, it is enough to directly type in the MyST Markdown source the U202F space

That is not entirely correct. U202F corresponds to a narrow, non-breaking thin space for use with units, whereas UA0 yields the standard width non-breaking space. (Incidentally, I happen to have a web page dedicated to this very topic.)

However, this little mistake perfectly illustrates the issue. The whole idea behind Markdown is for it to be human readable, allowing one to avoid such mistakes when editing or proof-reading a Markdown source document.

PHILOSOPHY Markdown is intended to be as easy-to-read and easy-to-write as is feasible. Source: https://daringfireball.net/projects/markdown/syntax#philosophy

When I type CtrlShiftUA0, or 202F, my text editor (Vim) will render in both cases an ordinary looking space, indistinguishable from a normal breaking space and hence not positively identifiable as any particular non-breaking space. This is not the case with Pandoc-style \, respectively \,. Moreover, both these codings are much shorter to type and there is no need to memorise any awkward UTF-8 codes.

(on my French AZERTY keyboard on GNU/Linux: AltGr+Shift+SPC).

Again, the same issue. I type on the US International ANSI keyboard layout on GNU/Linux, which implies that I also have a right side AltGr key apart from the left Alt. Nonetheless, ShiftAltGrSpace produces here an ordinary breakable space and nothing is telling me so. I actually had to run the cat -v command on the file to see what was really happening and to learn that, at least with my specific keyboard layout, it is actually ShiftAltSpace.

Conclusion

Above examples clearly demonstrate the need for a concise, human readable and positively identifiable Markdown extension for coding the standard width non-breaking space as well as the thin non-breaking space. To avoid disparity, Pandoc-style \ and LaTeX/MathJax-style \, are proposed.

dbitouze commented 9 months ago

AFAICS, at least for an HTML output, it is enough to directly type in the MyST Markdown source the U202F space

That is not entirely correct. U202F corresponds to a narrow, non-breaking thin space for use with units,

I don't see anything about narrow or thin space in The International System of Units (SI) brochure: only ordinary spaces are mentioned (see e.g. § “5.4.3 Formatting the value of a quantity”, p. 149 (151 of the PDF).

whereas UA0 yields the standard width non-breaking space.

You're right, my bad: AFAICS, it's UA0.

However, this little mistake perfectly illustrates the issue. The whole idea behind Markdown is for it to be human readable, allowing one to avoid such mistakes when editing or proof-reading a Markdown source document.

PHILOSOPHY Markdown is intended to be as easy-to-read and easy-to-write as is feasible. Source: https://daringfireball.net/projects/markdown/syntax#philosophy

Okay.

When I type CtrlShiftUA0, or 202F, my text editor (Vim) will render in both cases an ordinary looking space, indistinguishable from a normal breaking space and hence not positively identifiable as any particular non-breaking space.

In my better editor (Emacs) :wink:, such a space is rendered as underlined, which is definitively distinguishable from an ordinary space.

This is not the case with Pandoc-style \, respectively \,. Moreover, both these codings are much shorter to type

Not much shorter than AltGr+Shift+SPC (especially on a French AZERTY keyboard).

and there is no need to memorise any awkward UTF-8 codes.

No need to memorize any awkward UTF-8 codes with AltGr+Shift+SPC.