Open feiglex opened 7 years ago
This is a html2markdown
:XD cause we're encountering with the same problem...
And, related code is right here...
if hn(tag):
self.p()
if start:
self.inheader = True
self.o(hn(tag) * "#" + ' ')
else:
self.inheader = False
return # prevent redundant emphasis marks on headers
Hmm,,,,
So you want to keep the whole thing same but only removing #
from the output ?
yeah, I just want to keep the text i see in the website. Are there any configuration to remove # ?
No, there's no option to disable the formats. We aim to generate a text that at least is able to get back to its original format, removing all format options won't be helpful.
However, for the purpose of removing all the HTML tags from the text, you can use lxml or beautifulsoap and perform stripping html tags that will give.
There's a similar feature request: https://github.com/Alir3z4/html2text/issues/170
this should be called html2markdown
@mj-dd
this should be called html2markdown
Well said. This is a con: it cannot perform what its name implies.
html2text --2016.9.19
python --2.7.13
html = "<p>hello, this is <em>html2text</em></p><strong>it is strong label</strong>" h = html2text.HTML2Text() print h.handle(html)
h.ignore_emphasis = True print h.handle(html)
html = "<h6>This is title</h6>" print h.handle(html) # value: ###### This is title`