gesomax / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

Sockets left in CLOSE_WAIT after requests, eventually run out of file handles #41

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Configure Sam Ruby's Planet Venus to spider 1000+ feeds in an OPML file
2. Set spider_threads to about 12 or more
3. Run venus/spider.py

What is the expected output? What do you see instead?

The spidering run should complete without errors beyond problems in the
feeds themselves.  Instead, I start seeing "Too Many Files Open" errors
about 2/3 of the way through

What version of the product are you using? On what operating system?

SVN checkout r274 to replace what comes with Venus, on Mac OS X

Please provide any additional information below.

After running a large set of feeds through Planet Venus a few times, I
finally tried lsof when the Too Many Files Open errors started.  This
revealed almost every feed that had been polled so far in a CLOSE_WAIT
state.  So, I rummaged around in the code and found that if I inserted
conn.close() in a few spots in _conn_request(), that my problem was solved.
 Attached is a patch showing what I did.

But, what I wonder now is if these connections were left open for a reason
and if this patch breaks HTTP persistent connections?

Original issue reported on code.google.com by l.m.orch...@gmail.com on 2 Nov 2008 at 4:33

Attachments:

GoogleCodeExporter commented 8 years ago
This issue affected us as well. We're working around it by manually closing each
connection in http.connections after every request, but we're not 100% certain 
this
is the right fux (it seems to work though). Here's our idiom:

headers, body = http.request(url)
# Workaround for http://code.google.com/p/httplib2/issues/detail?id=41
[c.close() for c in http.connections.values()]

Original comment by simon%si...@gtempaccount.com on 1 Sep 2009 at 3:50

GoogleCodeExporter commented 8 years ago
Mailing list thread:
http://groups.google.com/group/httplib2-dev/browse_thread/thread/55cafd03850d895

Original comment by simon%si...@gtempaccount.com on 1 Sep 2009 at 3:50

GoogleCodeExporter commented 8 years ago
If you want a connection to be closed after the response is returned
then add a "Connection: close" header to the request:

 uri = "http://www.google.com/"
 (response, content) = self.http.request(uri, "GET", headers={"connection": "close"})

Original comment by joe.gregorio@gmail.com on 1 Sep 2009 at 5:00