aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.61k stars 412 forks source link

processing of <pre> element results in double-spaced text #9

Open mcepl opened 13 years ago

mcepl commented 13 years ago

When running this example script:

#!/usr/bin/python

import html2text

inStr = """
<pre class="wiki">"addnoresponse": {
    "name": "NoRespns",
    "position": "topRow",
    "commentIdx": "noResponseString",
    "status": "CLOSED",
    "resolution": "INSUFFICIENT_DATA"
},
</pre>
"""
print html2text.html2text(inStr)

I get this:

bradford:~ $ python test-PRE-bug.py 
"addnoresponse": {

        "name": "NoRespns",

        "position": "topRow",

        "commentIdx": "noResponseString",

        "status": "CLOSED",

        "resolution": "INSUFFICIENT_DATA"

    },

bradford:~ $ 

I mean this is pretty awful. I understand that you want to make this into Markdown, but shouldn’t html2text produce something at least a bit readable? Or could we get some parameter to html2text (prettyParse=true), which would avoid this?

aaronsw commented 13 years ago

I assume it's just a bug. That's not even the right Markdown.

mcepl commented 13 years ago

Glad to hear it is not intentional. Thanks.

aaronsw commented 13 years ago

Looks like it's a bug in the line-wrapping. If you turn that off, it should work.

mcepl commented 13 years ago

How should I do it? Lack of any reasonable documentation for html2text is another bug (or maybe I am stupid, and I just haven't found it).

fmarier commented 12 years ago

@mcepl you can turn off line wrapping like this: https://github.com/fmarier/blogger2ikiwiki/commit/d352a4655185640642fd8550bcd6c9740f915540

aaronsw commented 11 years ago

You can turn it off by setting body_width to zero, e.g. by -b 0.

aaronsw commented 11 years ago

This seems to be a little better in the latest version, but still confused by the first line.