Closed sebma closed 4 years ago
I get the same error, the problem is how stdin reading is handled for python 3.
$ curl --silent https://opensource.org/licenses/MIT | html2text - | head
Traceback (most recent call last):
File "/home/iwana/.local/bin/html2text", line 10, in <module>
sys.exit(main())
File "/home/iwana/.local/lib/python3.7/site-packages/html2text/cli.py", line 262, in main
data = data.decode(args.encoding, args.decode_errors)
AttributeError: 'str' object has no attribute 'decode'
Workaround for now:
$ curl --silent https://opensource.org/licenses/MIT | html2text /dev/stdin | head
Skip to main content
* [Home](/)
* [From the Board](/blog)
* [Contact](/contact)
* [Donate](/civicrm/contribute/transact?reset=1&id=2)
* [Login](/user/login)
## Search form
Note that for python2 it works fine.
@aucampla Thanks, I've switched html2text
to python2.
Thanks for reporting. This has been fixed on the master branch and will be in the next release.
@jdufresne could you tell, what was the commit with fix, please?
I believe it is b361467894fb277563b4547ec9d4df49f5e0c6e3
Hi html2text-2019.9.26 fail one test on Centos 7 (1 failed, 165 passed in 6.34 seconds )
test_command[/builddir/build/BUILD/html2text-2019.9.26/test/bodywidth_newline.html-cmdline_args10]
/usr/lib64/python3.6/subprocess.py:438: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/builddir/build/BUILD/html2text-2019.9.26/html2text/__main__.py", line 3, in <module>
main()
File "/builddir/build/BUILD/html2text-2019.9.26/html2text/cli.py", line 306, in main
sys.stdout.write(h.handle(data))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2032' in position 224: ordinal not in range(128)
Hi,
I'm using :
The URL I want to translate to text is http://www.peter-adam.com/jpv/JPV_Titles.php, it contains Chinese utf-8 text :
So I tried it with http://www.google.fr but it does not work either :
Can you help me ?