Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.85k stars 279 forks source link

html to markdown ,Generate data twice #428

Open caoshl opened 1 month ago

caoshl commented 1 month ago

get html from
https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html in the download html file ,only one data but in the mardkown text ,data twice

script :
def convert_html_to_markdown(html_text,base_url:str):
h = HTML2Text(baseurl=base_url)
h.ignore_links = False
markdown_text = h.handle(html_text)
return markdown_text