aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.57k stars 410 forks source link

UnicodeDecode Error #96

Closed amritkrs closed 9 years ago

amritkrs commented 9 years ago

r = requests.get('http://en.wikipedia.org/wiki/Monty_Python') print html2text.html2text(r.content) Traceback (most recent call last): File "", line 1, in File "html2text.py", line 812, in html2text return h.handle(html) File "html2text.py", line 254, in handle return self.optwrap(self.close()) File "html2text.py", line 266, in close self.outtext = self.outtext.join(self.outtextlist) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

mcepl commented 9 years ago
matej@mitmanek: ~$ html2text http://en.wikipedia.org/wiki/Monty_Python>/dev/null ; echo $?
0
matej@mitmanek: ~$

Get the updated code from https://github.com/Alir3z4/html2text or from my repository http://luther.ceplovi.cz/git/html2text.git. This repository is literally dead, because its author is.

amritkrs commented 9 years ago

@mcepl Thanks bro.

amritkrs commented 9 years ago

@mcepl inspite of using html2text from https://github.com/Alir3z4/html2text i still face the same problem. r = requests.get("http://en.wikipedia.org/wiki/Python_%28programming_language%29") print html2text.html2text(r.content)

Traceback (most recent call last): File "", line 1, in File "html2text/init.py", line 750, in html2text return h.handle(html) File "html2text/init.py", line 121, in handle return self.optwrap(self.close()) File "html2text/init.py", line 139, in close outtext = nochr.join(self.outtextlist) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 62: ordinal not in range(128)

mcepl commented 9 years ago

File a bug to @Alir3z4 then.

amritkrs commented 9 years ago

OK