davidar / pandiff

Prose diffs for any document format supported by Pandoc
MIT License
293 stars 24 forks source link

Multiple adjacent inserted paragraphs smooshed together in HTML output #3

Closed ptgolden closed 5 years ago

ptgolden commented 5 years ago

When a diff results in two or more adjacent paragraph insertions, the HTML output returns as if they are one giant paragraph. (I would imagine this applies to two or more adjacent deletions as well).

I've tracked it down to the point in postprocess when pandoc is run on the markdown representation of the diffed output. It seems like pandoc does not count a new paragraph as a paragraph if it's completely wrapped in <ins> or <del> tags. This means that this Markdown:

<ins>Paragraph 1</ins>

<ins>Paragraph 2</ins>

When run through pandoc results in:

<ins>Paragraph 1</ins>
<ins>Paragraph 2</ins>

If, for example, spaces are inserted before the <ins> tags in the above markup, the correct output is produced:

<p><ins>Paragraph 1</ins></p>
<p><ins>Paragraph 2</ins></p>

As a quick workaround, I added this line in postrender before the final pandoc call. It produces the correct output, but I don't know if it's the best solution.

text = text.replace(/\n\n<ins>/g, '\n\n <ins>')

Any thoughts?

(I'm running pandoc 2.2.1.)

davidar commented 5 years ago

It looks like they're being interpreted as block-level tags rather than inline tags inside a paragraph. The best solution would probably be to explicitly wrap those paragraphs in p tags.

Edit: though I think that markup is being generated by pandoc's markdown writer, so perhaps it's an upstream issue (html->md->html not roundtripping properly)