Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.85k stars 283 forks source link

Error using --pad-tables option with nested tables #147

Open toshism opened 8 years ago

toshism commented 8 years ago

When using --pad-tables option, if the html contains a nested table, html2text throws an IndexError.

html2text 2016.9.19

echo '<html><body><table><tr><td><table><tr><td></td></tr></table></td></tr></table></body></html>' | html2text --pad-tables

Python 2.7.12

theSage21 commented 8 years ago

I shall get on it as soon as I can.

theSage21 commented 8 years ago

So, the way I've implemented this right now is as a post processing option. Once the MD is generated, the tables in it are beautified. Right now I have no idea how to do this recursively, so I cannot implement anything to solve this. If something does pop into my mind, I'll put it in there.

theSage21 commented 7 years ago

I've done some digging around. https://github.com/vmg/redcarpet/issues/394 explains why nested tables are not a good idea in MD. I'll be fixing the index error but won't be adding extra support for nested tables in any way.

toshism commented 7 years ago

That's fair. Thanks for looking in to it. How do you plan on handling nested tables? I would like to use html2text in my mailcap file to convert html emails so I obviously don't have control over the html they'll be processing, but if it could at least handle them gracefully in some way that would be great.

theSage21 commented 7 years ago

Markdown tables end up being ambiguous if simple markdown is used. To avoid that I'm thinking of something along the lines of:

A  | B
---|------
...|[C | D]
...|---|---
...|x  | y
...|x  | y
...|___|___
z  | q
z  | q
Alir3z4 commented 7 years ago

@theSage21 That's really good, I agree with the approach,