Open Simon-Laux opened 1 year ago
as we are currently not handling markdown styles at all, it may be okay to support a subset only - but it may still be regarded as a bug and ppl may be wondering.
are there other notable markdown implementations that skip underscore processing? is this maybe a usual thing meanwhile?
i am not too much for/in markdown at all, however, removing such basic thing as _italic_
seems questionable: underscores are esp. nice as italics/underlining are typographically interchangeable - and by _italic_
one has kind of wysiwyg in plaintext. i see that use of underscore in plaintext/markdown very often, it feels natural. many markdown toolbars also use that variant for italics.
and: hasn't sth. as width*height*deep
(not to speak about C dereferencing :) the same issue? usually programmers know they have to backtick their stuff, cmp. eg. our changelogs :)
maybe also the parser can maybe do better, eg. allowing _
on word boundaries only (which seems to be a general difference to the asterix - also here on github, inside words asterix is changing style - where underscore is not)
in general, if we do not want to rely on parsing rules as there are too many cornercases (markdown is only apparently simple, reminds me very much to yaml :) we could also leave the formatting characters in the resulting text.
i have seen this here and there, and think this not the worse compromise, also makes adaption easier and a bad rule does not break things badly:
imagine passwords sent over delta chat with markdown support: this may be close to impossible, at least unreliable, if we introduce markdown
but it may still be regarded as a bug and ppl may be wondering.
we define what we want to support in https://github.com/deltachat/message-parser/blob/master/spec.md
maybe also the parser can maybe do better, eg. allowing _ on word boundaries only (which seems to be a general difference to the asterix - also here on github, inside words asterix is changing style - where underscore is not)
sure that's an option, but also more complicated than just removing underscore or saying that only on wordboundries is allowed. a side benefit of only on word boundaries is that it could make the parsing slightly faster.
in general, if we do not want to rely on parsing rules as there are too many cornercases (markdown is only apparently simple, reminds me very much to yaml :) we could also leave the formatting characters in the resulting text.
also a valid angle, but does not look as good. Maybe rich formatting really needs a special editor that auto escapes stuff you paste into it, but that's complicated and even element doesn't get it right (I had problems with it).
imagine passwords sent over delta chat with markdown support: this may be close to impossible, at least unreliable, if we introduce markdown
I think element auto-escapes stuff you paste into it, another option would be to allow users to toggle markdown rendering in their messages... but right there are more open questions to the whole rich message topic, also wether we want to send messages as html or not.
another example I came across: #deltachat_desktop
becomes #deltachatdesktop
, with desktop being displayed in italics font.
another example I came across: #deltachat_desktop becomes #deltachatdesktop, with desktop being displayed in italics font.
the text emote -_-
can also falsely trigger underline. again I think we should just remove it from the spec.
just tested WhatsApp, it has _
for italics and *
for bold, but is more clever about it:
_hi_
- triggers italics
_hi_hi
- does not trigger italics
_hi_hi_
- triggers italics
_hi _hi_
- triggers italics
_hi _hi _
- does not trigger italics
so italics only works when started and ended outside of a word. Which could also be a solution for us here.
another bug in the implementation of _
is:
_[test](https://example)_
which should be displayed as:
is actually displayed as:
[test](https://example)_
(the [ is italic while the rest of the link is not parsed)
This is also how pandoc handles it:
pandoc -f markdown -t html
_foo_bar_baz_
<p><em>foo_bar_baz</em></p>
pandoc -f markdown -t html
foo_bar_baz
<p>foo_bar_baz</p>
CommonMark has a very precise definition: https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emphasis
undercore
_
is used often in text. especially for programmers having things likeANDROID_NDK_ROOT
in a message is not a rare occurrence.currently
ANDROID_NDK_ROOT
parses intowhich is not desired in most cases.
Proposed solution
remove underscore variants from bold and italics, so its just
*italics*
and**bold**