aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.61k stars 412 forks source link

Long list lines do not wrap #13

Open scumop opened 13 years ago

scumop commented 13 years ago

I noticed an

<li>with 200 characters</li> 

outputs a 200 character long line. I found this irritating, so added some code in v3.02 method optwrap(text)

Just a fragment

WAS:

for para in text.split("\n"):
    if len(para) > 0:
        if para[0] != ' ' and para[0] != '-' and para[0] != '*':
            for line in wrap(para, BODY_WIDTH):
                result += line + "\n"
            result += "\n"
            newlines = 2
        else:
            if not onlywhite(para):
                result += para + "\n"
                newlines = 1

IS:

reList = re.compile('(^[ ]+[0-9]+\. )|(^[ ]+\* )')
for para in text.split("\n"):
    if len(para) > 0:
        if para[0] != ' ' and para[0] != '-' and para[0] != '*':
            for line in wrap(para, BODY_WIDTH):
                result += line + "\n"
            result += "\n"
            newlines = 2
        else:
            # Handle list item - split lines with indent under. 
            if reList.match( para ):
                indent = False
                indent_spaces = ''
                for line in wrap(para, BODY_WIDTH - 6): # -allowance for indentation pad
                    if False == indent:
                        indent = True
                        result += line + "\n"
                        # Find length to start of text for indent spacing
                        lst = reList.search(line).group()
                        indent_spaces =  ' ' * len(lst)
                    else:
                      result += indent_spaces + line + "\n"
                result += "\n"
                newlines = 1
            elif not onlywhite(para):
                result += para + "\n"
                newlines = 1
aaronsw commented 13 years ago

Can you submit this as a pull request?