Open john-hewitt opened 7 years ago
A large number of non-English web pages I'm working with break html2text. Attached is an HTML file that replicates this issue.
import html2text html2text.html2text(open('breaking.txt', 'r').read().decode('utf-8'))
import html2text
html2text.html2text(open('breaking.txt', 'r').read().decode('utf-8'))
breaking.txt
I am not able to reproduce this on the latest master. Perhaps this has been fixed and can be closed.
Version : 2016.9.19
Python Version : 2.7.13
A large number of non-English web pages I'm working with break html2text. Attached is an HTML file that replicates this issue.
import html2text
html2text.html2text(open('breaking.txt', 'r').read().decode('utf-8'))
breaking.txt