beeminder / blog

3 stars 2 forks source link

Spurious newlines from html in markdown #376

Open dreeves opened 1 year ago

dreeves commented 1 year ago
### Desiderata
- [x] Diagnose why the markdown renderer does this (it's just what Marked does)
- [x] Decide if the workaround suffices (yes)
- [x] NIX: Find other instances where we we need to apply the workaround
- [x] Break the build if an html comment starts at the beginning of a line and the previous line isn't blank
- [ ] Dreev confirms that the build breaks when we do this

Replicata

  1. Put the following markdown in a source doc:
This is a sentence
<!-- with an html comment in the middle -->
that ends here and should all appear on one line.

Expectata

To see that rendered as:

This is a sentence that ends here and should all appear on one line.

Resultata

It's rendered as:

This is a sentence

that ends here and should all appear on one line.

Nota Nebulosa

Workaround: Put a space in front of the html:

This is a sentence
 <!-- with an html comment in the middle -->
that ends here and should all appear on one line.

Cognata

Verbata: markdown rendering, mixing html and markdown, html comments in the markdown source, spurious newlines,

narthur commented 1 year ago

My uneducated guess is that Marked strips the html comments as having no meaning to the parser, and so the raw markdown is left as so, resulting in two paragraphs.

This is a sentence

that ends here and should all appear on one line.

Out of curiosity, this is what GitHub does:

This is a sentence

that ends here and should all appear on one line.

Which makes sense I guess since Marked's goal is to follow GFM I think.

dreeves commented 1 year ago

Normally I'd say we should just follow GFM but in this case we're counting on a big difference from GFM: only inserting newlines if the markdown line ends in two spaces, per the original markdown spec.

In any case, I just tried the replicata at https://markdownlivepreview.com/ and it seems this is indeed standard behavior, not a bug we introduced.

So I think the right answer is to apply the workaround to all the source docs.

dreeves commented 1 year ago

Oof, this is pretty pervasive. Including in http://blog.bmndr.co/astroblog and we didn't notice 😬 I'm going through a gazillion old posts now applying the workaround but i'm not sure this is worth it compared to programmatically making it match the old behavior. It kind of seems like a bug in Marked to me.

narthur commented 1 year ago

@dreeves Ok. So maybe something like replacing all instances of \r?\n?\<\!\-\-.+?\-\-\>\r?\n? with an empty string? I wouldn't want to replace more than one new line on either side for fear of creating the opposite bugs, causing separate paragraphs to become a single paragraph.

dreeves commented 1 year ago

That sounds like dark magic, er, an egregious anti-magic violation. But a lot of markdown has that problem so I don't know.

Actually maybe dark magic is right: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Like what if there's an html comment in a pre block? Your proposed patch would make it go poof.

This sucks. Stupid magic markdown.

dreeves commented 1 year ago

Oh! New idea: break the build if some regex like the one in your proposal matches. Then you're forced to do something like the following as a workaround:

Line 1.
 <!-- note the space at the beginning of this line which works around the dumb Marked bug -->
Line 2.

Breaking the build is always nice and safe because it's loud. If you try to apply a patch to handle html comments correctly and there's a subtle bug in it then you risk breaking everything silently. (Man, learning to live and breath the anti-magic principle continues to point the way to the truth and the light; I'm so enamored with it!)

What if the regex matches too many things and we don't want that much build-breaking? That's totally fine because we can then decide whether to keep applying the workaround to keep the build from breaking or to tweak the regex to make it break the build less.

PS: See new desideratum.

narthur commented 1 year ago

@dreeves I like this solution. 👍

narthur commented 1 year ago

Builds should now break as @dreeves described above: https://github.com/beeminder/blog/pull/409

dreeves commented 8 months ago

This doesn't work yet :(

If I comment out multiple lines like so:

<!---
stuff to comment out
another line being commented out
-->

then it renders a —&gt; on the blog. Adding spaces before the <!-- doesn't help.

PS: This doesn't replicate for me on the new etherpad, https://beetherpad-web.onrender.com/tmp -> https://beetherpad-web.onrender.com/tmp?public=true