Only escape `*` at the start of a line.

ncphillips commented 6 years ago

Since introducing the prosemirror editor to Forestry.io, we have been super happy. There has been a few illusive bugs around the * character. This PR introduces the fix for those bugs

Bug Fixed Examples

When in Text Node contains an HTML Table

This was the troublesome case that brought this bug to our attention:

{
  "type": "text",
  "text": "<table><tr><td>*test</td></tr></table>"
}

Output was:

<table><tr><td>\*test</td></tr></table>

\*test

Output now:

<table><tr><td>*test</td></tr></table>

*test

When marked up as `code`

{
  "type": "text",
  "text": "*foo"
},
{
  "type": "text",
  "marks": [
    {
      "type": "code"
    }
  ],
  "text": "*"
}

Output was:

\*foo`\*`

*foo\*

Output now:

\*foo`*`

*foo*

Non-Regression Examples

The following examples are changes in output from prosemirror-markdown, but not changes in intended behaviour for the output site.

Inside text node but after a new line

Prosemirror Node:

{
  "type": "text",
  "text": "Test\n\***"
}

Output was:

Test
\*\*\*

Test ***

Output Now:

Test
\***

Test ***

Bad Horizontal Rules

{
  "type": "text",
  "text": "--*"
}

Output was:

--\*

--*

Output now:

--*

--*

Problems with this solution

I believe this solution fixes more bugs then it creates. It does however create one bug for sure.

This is frustrating, but right now we think this is the more acceptable bug for Forestry.

* at end of strong text

{
  "type": "text",
  "marks": [
    {
      "type": "strong"
    }
  ],
  "text": "My Text *"
}

Output was:

**My Text \***

My Text *

Output now:

**My Text ***

**My Text ***

Future Work

Along the lines of #16, I think the ideal solution to the problems introduced by this PR would be to handle those edge cases by switching the opening and closing characters depending on the circumstance.

For example: If the bold text starts or ends with * then use __ instead.

I am currently trying to change our markdown tokens to work this way.

ncphillips commented 6 years ago

I have a fix that works, and that I will go with in my fork, but I'm not sure it's the right fix for the library.

It involves changing the signature of the open and close functions for marks to accept the node.

marijnh commented 6 years ago

* characters have special meaning in Markdown, even when not at the start of a line, so I don't see how not escaping them is an option. Also, the CommonMark parser should parse \* to become *, so the escaping should be harmless.

The issue where it escapes things in inline code was a different problem, which I've fixed in eff7162. I do not understand what the problem in tables is.

Could you start by describing the problem concretely, not just with examples? Can you show it breaking in this demo?

ncphillips commented 6 years ago

Markdown serializer's don't allow you to write Markdown inside of table related tags, so the asterisks do not need to be serializer.

Steps to reproduce in that demo:

Paste this <table><tr><td>Example *</td></tr></table> into the wysiwyg
Switch to Markdown
The output text will be: <table><tr><td>Example \*</td></tr></table>

The text output by the wysiwyg renders to:

Example \*

marijnh commented 6 years ago

Are you pasting that as HTML or as text? In any case, I'm seeing the markdown parsing unescape the asterisk for both, so the backslash doesn't show up in the wysiwyg.

ncphillips commented 6 years ago

I'm pasting the HTML string <table><tr><td>Example *</td></tr></table> into the wysiwyg editor, and the asterisks is escaped is in the raw editor:

table

The asterisk should not be escaped. When it is, the \ is visible in the built site.

This is the case for the markdown parsers for Github, Hugo, Jekyll, and VuePress.

marijnh commented 6 years ago

The asterisk should not be escaped.

I think that's debatable.

When it is, the \ is visible in the built site.

I think that counts as a bug (or at least, a CommonMark non-conformance) in your Markdown renderer.

ncphillips commented 6 years ago

I suppose that could be true. Unfortunately, I have not found a Markdown renderer that does conform to that practice, and our users won't accept that as an answer. I'm okay with closing this PR, as we have a fork of the serializer that handles the case.

marijnh commented 6 years ago

markdown-it is a good CommonMark implementation in JS (also drives the parser in prosemirror-markdown). Not sure about other languages. Going to close this, then.

ncphillips commented 6 years ago

Unfortunately I don't have control over which markdown parser our client's use. We just output the Markdown, and they use Jekyll/Hugo/VuePress/etc. to generate the HTML.

That said, markdown-it also appears to not want asterisks to be escaped when inside of table cells:

marijnh commented 6 years ago

Ahh, I see what's going on now. prosemirror-markdown doesn't support inline HTML, since there's no way to represent that in the basic schema, but standard Markdown/CommonMark will treat that asterisk as being inside of a tag, and thus treat it as HTML, not Markdown text.

The proper solution would, I suppose, be to add inline and block HTML tags to your schema, and extend your markdown serializer and parser to handle them. But that's not quite trivial, since you also have to find a solution for the way users will edit such nodes in the wysiwyg view (using a node view with an embedded editor, maybe).

ProseMirror / prosemirror-markdown