Open scottwallacesh opened 2 years ago
I've looked into this and have come up with a hypothesis: the test suite is being built ahead of time and each test is populated with an instance of httpclient.get_http(...)
which itself is an instance of Http
which inherits from urllib3.PoolManager
. This means each test uses a connection pool once, but then the connection pool is left in memory with an open socket that isn't used again. After around 1021 tests, we run out of file handles.
A potential, working solution for this could be the following but it feels more like a hack than anything else.
--- a/gabbi/httpclient.py
+++ b/gabbi/httpclient.py
@@ -51,6 +51,8 @@ class Http(urllib3.PoolManager):
headers = response.headers
headers['status'] = str(status)
headers['reason'] = reason
+
+ self.clear()
return headers, content
Good find.
The main reason pool's aren't reused is so that each test can set its own requirements for hostname, verbosity, and cert validation.
And the reason they don't go out of scope is because every test in a single yaml file is kept in memory for the duration of the suite, as this allows the back references ($RESPONSE, $HISTORY, etc). So if you're oriented towards big test suites you should keep that in mind as it might create some memory issues for you. The original design of gabbi (before gabbi-run
was a thing) was oriented towards being able to parallelise suites, with each suite being a short "arc" of an API interaction. So not much attention was paid to what would happen with a single big yaml file.
Also, using PoolManager was a late addition. Earlier versions used httplib2, which is much simpler, but it looked like it was no longer being maintained, so a switch was made.
self.clear()
is probably a pretty good solution in this case, since there isn't a simple way to reuse the Pools while preserving the goal of tests being able to define the inputs to get_http
individually. If we wanted to be clever we could memoize calls to get_http, but that hardly seems worth it here.
Thoughts?
We could clear the pool, as I've done above, for a short-term solution whilst thinking about a better long-term solution. My initial thoughts are to utilise a shared HTTP pool somehow.
I have a YAML file with nearly 2000 tests in it. When invoked from the command line, I run out of open file handles due to large amounts of sockets left open:
By default a Linux user has 1024 file handles:
Inspecting the open file handles: