Closed Jorl17 closed 8 years ago
I'm not sure that this is very usefull since 90% of users may use requests who already use chardet (or an alternative). So I dont really see the benefit. But if you need this, pull request with tests are always welcome.
Perhaps I misused the library, but here's a snippet of "offending" code:
page = PyQuery ( url = ... )
page_html = page.html ()
Line 2 of this code blew up with encoding errors in etree (Python 3) on a Raspbian distro. This did not happen on a Mac OS X system. Unfortunately, I do not have the offending URL (and cannot find it again), but I think I fixed this by manually doing
f = urllib.request.urlopen(url = ...)
page_html = PyQuery(str(f.read()).encode('utf8')).html()
Which sounds hacky, dirty, and to be honest, since I can't test it, I'm not even sure if it's really correct. It was with this in mind that I thought of modifying PyQuery to include support for chardet. Do you think I was doing something wrong? Thanks!
Just try with requests installed. PyQuery will use it. See https://github.com/gawel/pyquery/blob/master/pyquery/openers.py#L69
And since requests' resp.text is unicode, everything should be fine.
You can also use PyQuery(url=my_url, encoding='utf-8')
PyQuery(url=my_url, encoding='utf-8')
will not work because the offending encoding is not utf-8 compatible. That is the issue here. Requests was not an option on our project. We resorted to using chardet internally.
Nevertheless, I understand why this is a "non-issue" and agree that it makes sense to close it
Recently I ran with some issues when output wasn't in the expected encoding format (which I see is, by default, utf-8) in the html() method.
My application could not, a priori, know what the encoding was going to be, and this was a nuisance.
Perhaps using the chardet library could help alleviate this issue? It might also be the case that I simply did not correctly understand how html() works. If that was the case, then I'm sorry.
If indeed you think this fix should be applied, I can chip in and add it myself. Also, thanks for the awesome work!