bevacqua / domador

:smirk_cat: Dependency-free and lean DOM parser that outputs Markdown
https://ponyfoo.com
MIT License
86 stars 6 forks source link

spans (from rich text) inside cells cause newlines #7

Open jywarren opened 8 years ago

jywarren commented 8 years ago

Pasting content from an external page into a table in a Woofmark instance (like this one: https://publiclab.github.io/PublicLab.Editor/examples/) often results in spans inside tables.

screenshot 2016-07-29 at 11 55 15 am

That becomes:

| col0 | col1 | col2 |
|------|------|------|
| cell | through the steps  
 | cell |
| cell | cell | cell |
| cell | cell | cell |
| cell | cell | cell |

However, tds may often contain spans, and it seems we could figure out how to parse them gracefully at least. Here it seems like that is almost happening properly.

The span tags are removed. I was going to suggest just that the extra newline could be stripped too. I'm happy to write a test and attempt this change?

bevacqua commented 8 years ago

Please do!

jywarren commented 8 years ago

That's odd -- putting a span into the 'tables with complex content still get proper padding' test doesn't trigger the issue. Digging a bit.

jywarren commented 8 years ago

OK -- pasting in even a single word (on a recent Chrome, in ChromeOS) appends a <br> to the table cell content:

<td><span style="color: rgb(170, 170, 170);">steps</span><br></td>

Trying to think through if we should strip that out, strip trailing <br>s out, or what. It definitely doesn't survive the markdown conversion because a newline mid-row breaks the table.

I think it's appropriate to strip <br> tags in this case. Thoughts or counter-cases?

bevacqua commented 8 years ago

Are you copying a part of a page and getting this rich format? Where are you pasting? It may be more of a contenteditable issue than something specific to domador I think.

I'm fine with stripping <\s*br\s*\/?\s*> from start/end of cell, in any case.

jywarren commented 8 years ago

Yes, copying a single word (mid-sentence). I'd agree that it's contenteditable but it's also an extremely common use case for Woofmark (actually reported by a user of one of my projects). Perhaps it could be a configurable option?

Though if a <br> results in an unparseable markdown table, it seems like an issue regardless of where the
came from, but I guess that's a question for the GFM table spec.

bevacqua commented 8 years ago

I think the issue specific to domador is that <br> in table cells should be kept as-is, and not as \n. There should, furthermore, be a fix to woofmark where we strip <br> from start/end of pasted content. Thoughts?

jywarren commented 8 years ago

That sounds reasonable, yeah. I didn't find tables represented in Commonmark, but is the GFM tables behavior supposed to be that HTML may be nested in tables, but Markdown may not be?

For example, an unordered list can be placed inside a table in woofmark but it doesn't survive conversion to valid markdown via domador. I tend to think it's unreasonable (impossible?) to expect multiline Markdown to be parseable inside a GFM table cell (see below), but just curious if there's a well-established spec for that.

| col0 | col1 |
|------|------|
| cell | 

- one
- two |
| cell | cell |
jywarren commented 8 years ago

And for good measure, this works:

col0 col1
cell
Hi!
cell cell
| col0 | col1 |
|------|------|
| cell | <br> Hi! <br> |
| cell | cell |
bevacqua commented 8 years ago

Yeah, I'd avoid newlines being produced by domador in general when creating Markdown. How about a flag where we tell domador to preserve certain HTML tags if in a cell?

bevacqua commented 8 years ago

As in:

| col0 | <ul><li>one</li><li>two</li></ul> |
|------|------|
| cell | <ul><li>one</li><li>two</li></ul> |
| cell | cell |

Which would render properly:

col0
  • one
  • two
cell
  • one
  • two
cell cell
bevacqua commented 8 years ago

I'm pretty sure we need to preserve all block elements and <br>, any others you can think of?

bevacqua commented 8 years ago

By the way if you want to join ponyfoo.com/slack that'll make chatting about this stuff easier :)

jywarren commented 8 years ago

Joined, thanks. Created a test for this in #8, will take a look at coding in a bit.