Socket leak with large YAML test files

scottwallacesh commented 2 years ago

I have a YAML file with nearly 2000 tests in it. When invoked from the command line, I run out of open file handles due to large amounts of sockets left open:

ERROR: gabbi-runner.input_/foo/bar/__test_l__
    [Errno 24] Too many open files

By default a Linux user has 1024 file handles:

$ ulimit -n
1024

Inspecting the open file handles:

$ ls -l /proc/$(pgrep gabbi-run)/fd | awk '{print $NF}' | cut -f1 -d: | sort | uniq -c
      1 0
      2 /path/to/a/file.txt
      1 /path/to/another/file.yaml
   1021 socket

scottwallacesh commented 2 years ago

I've looked into this and have come up with a hypothesis: the test suite is being built ahead of time and each test is populated with an instance of httpclient.get_http(...) which itself is an instance of Http which inherits from urllib3.PoolManager. This means each test uses a connection pool once, but then the connection pool is left in memory with an open socket that isn't used again. After around 1021 tests, we run out of file handles.

A potential, working solution for this could be the following but it feels more like a hack than anything else.

--- a/gabbi/httpclient.py
+++ b/gabbi/httpclient.py
@@ -51,6 +51,8 @@ class Http(urllib3.PoolManager):
         headers = response.headers
         headers['status'] = str(status)
         headers['reason'] = reason
+
+        self.clear()
         return headers, content

cdent commented 2 years ago

Good find.

The main reason pool's aren't reused is so that each test can set its own requirements for hostname, verbosity, and cert validation.

And the reason they don't go out of scope is because every test in a single yaml file is kept in memory for the duration of the suite, as this allows the back references ($RESPONSE, $HISTORY, etc). So if you're oriented towards big test suites you should keep that in mind as it might create some memory issues for you. The original design of gabbi (before gabbi-run was a thing) was oriented towards being able to parallelise suites, with each suite being a short "arc" of an API interaction. So not much attention was paid to what would happen with a single big yaml file.

Also, using PoolManager was a late addition. Earlier versions used httplib2, which is much simpler, but it looked like it was no longer being maintained, so a switch was made.

self.clear() is probably a pretty good solution in this case, since there isn't a simple way to reuse the Pools while preserving the goal of tests being able to define the inputs to get_http individually. If we wanted to be clever we could memoize calls to get_http, but that hardly seems worth it here.

Thoughts?

scottwallacesh commented 2 years ago

We could clear the pool, as I've done above, for a short-term solution whilst thinking about a better long-term solution. My initial thoughts are to utilise a shared HTTP pool somehow.

cdent / gabbi

Socket leak with large YAML test files #313