Closed jpallari closed 1 year ago
More insight on how the pool usage is not done successfully prior to this PR.
When the pool
parameter is set in the load test configuration, the parameter is passed to the request library. In their documentation, request says the following:
Note that if you are sending multiple requests in a loop and creating multiple new pool objects, maxSockets will not work as intended. To work around this, either use request.defaults with your pool options or create the pool object with the maxSockets property outside of the loop.
Therefore, even when using the pool
parameter, each request ends up using its own pool.
It's not obvious how the pool
parameter in request should be used. In order to get around it, the fix skips the pool parameter, and uses a custom agent instead.
👍
Thanks for the PR. I don't have the time to try the patch right now, but I'll give it a go as soon as I can. A couple of comments:
netstat
while Artillery is running, you should see the test complete with no errors and the number of open connections at 100 throughout the test even though multiple agents will be created (one per scenario). Your test as-is does not necessarily demonstrate a bug since you're allowing Artillery to open up to 1,000,000 concurrent connections when 64k is the absolute possible maximum.pool
setting and rerunning the test and you should see the number of open connections shoot up quickly. You might see a too many open files
error very quickly as well depending on what ulimit -n
reports on your system.As to correct usage of pool
- that particular sentence from their docs is not very clear, however as far as I understand, pool
refers to the pool of agents rather than sockets, and each agent is currently only allowed 1 socket, hence the total number of open connections when config.http.pool
is set should not exceed that number.
Thanks for the quick response!
If you re-run your example script with a realistic pool setting, e.g. 100 and run netstat while Artillery is running, you should see the test complete with no errors and the number of open connections at 100 throughout the test even though multiple agents will be created (one per scenario).
Yes, I managed to run the test arrival rate of 100 per second. However, increasing the rate beyond that started producing errors.
Your test as-is does not necessarily demonstrate a bug since you're allowing Artillery to open up to 1,000,000 concurrent connections when 64k is the absolute possible maximum.
Yeah, I didn't think it'd ever reach that limit. I provided a ridiculously huge number in order to allow it to run uncapped. This is similar to the default Node.JS HTTP client behaviour. I should have been more explicit about that.
Each scenario creates a new TCP connection by design (by default) to mimic real-world user behavior.
Ahh, I think that explains quite a bit. I guess my goals are different from Artillery's goals, which is why my approach here is so different. :)
Not an arrival rate of 100 but a pool of 100 to limit the number of max open concurrent connections. Re the 1,000,000 limit: of course you will reach your system's limit quickly if you're opening 1000 connections per second. 64k is the maximum theoretical limit, likely to be much lower on your actual system (check ulimit -n
), that's why you're seeing those errors. Setting the pool to 1,000,000 makes no sense when you have one client and one target, both running on the same host.
I just tried your YAML script buth with pool
set to 100 and the test run completes and the number of open sockets remains constant at 100.
The reason the pool
setting is there is for use-cases like yours, when you don't want a new socket for each scenario. I'm not sure if there's a bug in the current implementation though.
AFAIK, the pool just sets the upper bound for how many sockets you can have at maximum in a single agent.
It seems the implementation is altered enough to not make the comparison fair. I'll update the test and results later, because I'm continuing the tests on a different machine. Interestingly, I'm seeing better results with another machine.
Yes, pool
is the upper bound, but if you set the upper bound to 1 million, you're going to run out of file descriptors or ports on your system very quickly if you're creating 1000 of them every second. Again, the maximum theoretical number of open TCP connections is 64k. Do you see what the problem with setting pool
= 1,000,000 is? It's impossible to have 1M open connections from a single client to a single server. You are not in fact testing anything at all with the original test script. There's no difference between setting a value that large and not setting a value at all. Does that make sense?
Edit: I believe pool is the pool of agents as I'd mentioned earlier, and each agent has a maxSockets
setting which sets the limit on sockets a single agent can open. The current implementation sets that to always equal 1 and creates an agent per scenario OR lets Request.js manage a pool of agents of some maximum size.
If you reuse the connections as is done in this PR, you don't necessarily get to see the TCP connections get maxed out.
In the original version, you get a lot of connections established and not reused, which means that there will be a large buildup of scrapped connections in TIME_WAIT
state.
The connections ARE being re-used already if you use the config.http.pool
setting correctly, that's the entire point of having that setting. I don't know how else to explain that your test case does not test what you seem to think it's testing. Please take the time to re-read my messages. Artillery is behaving exactly as expected given your input and the operating system limits when you set pool
= 1,000,000. With pool set to a sane value, e.g. something LESS than the max number of fds on your system or the maximum theoretical number of TCP connections, everything works as expected. Set pool
to 100 and you'll have a maximum of 100 connections.
I updated the test and the results to use only a pool of max 100 sockets. These tests were run in OS X.
Yesterday, I also ran the same tests in Dell XPS 9630 (i5, 8GB) with Fedora Linux 25 as the OS. The original code did not produce the errors reported here, but I could see the connection count rise without stopping. When running the same test with the code from this PR, the connection count did not rise above a certain level.
Previously, each new scenario established a new request agent, which resulted into inefficient use of HTTP connections. The same issue could be replicated with both the custom and default pool settings.
This fix establishes a single request agent during the engine creation, which is shared between all HTTP/HTTPS scenarios.
I wrote a crude test to verify the improvement:
Results WITHOUT the fix:
Results WITH the fix in place:
These tests were ran on a Mid-2015 Macbook Pro (2.2 GHz Intel Core i7, 16GB RAM) with OSX El Capitan as the operating system.
As you can see, the first result produces a considerable amount of errors, while the second results don't produce any errors.