aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.58k stars 410 forks source link

Fix initial crowded <pre> output #63

Closed wking closed 11 years ago

wking commented 11 years ago

html2text has problems when the HTML to parse starts off with:

<pre>stuff...

It works fine with

<pre>
stuff...

This problem was acknowledged in #9 https://github.com/aaronsw/html2text/issues/9#issuecomment-8735046

html2text's parsing procedure is a bit opaque to me, so this may not be the cleanest fix, but it does work.

wking commented 11 years ago

I think a proper fix for this issue would be to restructure the whole output framework to be more line-based (to make it easier to figure out where preceding whitespace comes from, and make it easier to strip trailing whitespace), but that's too big a task for me to commit to at the moment.