Open leesei opened 12 years ago
You should be able to use install_opener
to do this.
html2text is using urllib
currently, so install_opener
is not effective.
my quick solution:
import urllib
urllib.URLopener.version = 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16'
import html2text
html2text.main()
Thanks both.
Actually I'm wondering why html2text uses urllib instead of urllib2? For backward compatibility reasons? (I'm not sure when is urllib2 added to python) I changed my copy to use urllib2 to specify the timeout value for the connection.
@aaronsw
If I added an ua
option, would you care to merge to the master branch ^^?
I'm new to Python and glad to find this module to allow me to parse webpages. I would like suggest adding support for spoofing user agent for HTTP sources. Some webpage will return 401 when using urlopen(), e.g. http://www.google.com/patents/US5255452. Currently I'm using another Python (2.7) script to dump the output with user agent spoof for html2text: