Closed fwkoch closed 1 week ago
There should be an easy first pass on this that just combines sequential HTML nodes together if they are only separated by whitespace...
I can't actually reproduce this issue in any other case besides blank whitespace interruptions, so maybe we don't need more complicated html parsing... Once #1333 lands, I'll consider this closed - can always re-open if we encounter more issues!
Ok - it's still pretty easy to get html to fail, for example:
<b>blah
</b>
doesn't work - it just loses the bold tag. However, fixing that requires a deeper change to the actual parsing / html node creation. This issue just handles correct html that is split across mdast nodes...
The fix to resolve the specific example in the original issue has been released, so I am going to consider this closed.
As my previous comment suggests, there is potentially more work to be done for smarter html parsing, but that can wait until more real examples come up.
Description
Given this simple HTML:
MyST fails. Rather than creating a single
table
, it creates atable
for the first line and atableRow
for the second line. MyST attempts to handle this with thereconstructHtml
transform - https://github.com/executablebooks/mystmd/blob/main/packages/myst-transforms/src/html.ts#L243 - but that function makes major assumptions about the shape of the HTML which are easily violated.Proposed solution
We probably need an actual HTML parser in the
reconstructHtml
transform so we know if the block of HTML is entirely closed or if it still has open elements.