google-code-export / feedparser

Automatically exported from code.google.com/p/feedparser
Other
1 stars 0 forks source link

HTTP redirect resource exhaustion #395

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I opened issued #394 "ENH: urllib3 for HTTP" regarding urllib3 support.

It appears that issue #394 has been deleted without mention of cause.

Problem: Feedparser is slow as a result of being wasteful of sockets when 
handling HTTP redirects.

1. Connections resulting from HTTP redirects open additional sockets
2.

Partial solution: utilize urllib3

Original issue reported on code.google.com by wes.tur...@gmail.com on 17 Mar 2013 at 4:39

GoogleCodeExporter commented 9 years ago
urllib2.HTTPRedirectHandler does make a best effort to detect infinite 
redirects:

http://hg.python.org/cpython/file/v2.7.1/Lib/urllib2.py#l563

urllib3 would be more resource efficient in these cases.

I realize that external dependencies are best when minimal.

Original comment by wes.tur...@gmail.com on 17 Mar 2013 at 5:04

GoogleCodeExporter commented 9 years ago
It would be great if there was a specific Exception for this issue.

Original comment by wes.tur...@gmail.com on 17 Mar 2013 at 5:05

GoogleCodeExporter commented 9 years ago
I have no idea who could have deleted issue 394, but thanks for bringing that 
to my attention.

I don't consider the HTTP features of feedparser to be a priority, so I won't 
introduce an external dependency or even a recommended dependency. I recommend 
that developers use a library such as urllib3 or requests (as examples) to 
download the feeds and then pass those to feedparser to be parsed.

Original comment by kurtmckee on 27 Apr 2013 at 6:54

GoogleCodeExporter commented 9 years ago
Thanks.

* http://docs.python-requests.org/en/latest/
* http://urllib3.readthedocs.org/en/latest/

Original comment by wes.tur...@gmail.com on 29 Apr 2013 at 9:04