Open caoshl opened 1 month ago
html2text --version
get html from https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html in the download html file ,only one data but in the mardkown text ,data twice
https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html
script : def convert_html_to_markdown(html_text,base_url:str): h = HTML2Text(baseurl=base_url) h.ignore_links = False markdown_text = h.handle(html_text) return markdown_text
html2text --version
get html from
https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html
in the download html file ,only one data but in the mardkown text ,data twicescript :
def convert_html_to_markdown(html_text,base_url:str):
h = HTML2Text(baseurl=base_url)
h.ignore_links = False
markdown_text = h.handle(html_text)
return markdown_text