Open glendeni opened 4 years ago
Thanks for the report.
Python 3.4 is end of life and no longer supported by the project. However, I tested anyway. The following command works for me using the latest commit on master:
$ curl https://www.co.monterey.ca.us/government/departments-i-z/resource-management-agency/public-works/road-conditions-closures | python -m html2text
Are you using the same or something else? Can you retest using the master branch?
Thanks for your reply. I'm using version 2020.1.16, which is what I obtained by running 'pip install html2text' yesterday so assume it is the latest version. Using the curl command with that html2text ala your example I still get the same error. [FWIW do not get that error for other webpages.]
If is works for you the problem must be on my end, so you might as well move on. I'm not a python user so don't have knowledge to figure out what is wrong on my end. I myself have moved on to instead use Debian html2text program (which despite same name I assume has a different source.)
using version downloaded today with Python 3.4.3 get
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 59766: ordinal not in range(256)
from
https://www.co.monterey.ca.us/government/departments-i-z/resource-management-agency/public-works/road-conditions-closures
adding --decode-errors=ignore gives same result