Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.79k stars 273 forks source link

when span and p cascded there will be multiple breaklines #290

Closed chen-xiao-dong closed 4 years ago

chen-xiao-dong commented 4 years ago
    h = html2text.HTML2Text()
    h.body_width = 0
    h.ignore_emphasis=True
    print(h.handle("<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body lang="ES-TRAD" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif; color:#1F497D">Hi Mayank,</span></p>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-family:&quot;Arial&quot;,sans-serif; color:#1F497D">May you share the Release Notes please?</span></p>"))
jdufresne commented 4 years ago

The extra newline is from the line:

<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif; color:#1F497D">&nbsp;</span></p>

This looks to be working as expected to me.