Establish a single agent for all the scenarios

jpallari commented 7 years ago

Previously, each new scenario established a new request agent, which resulted into inefficient use of HTTP connections. The same issue could be replicated with both the custom and default pool settings.

This fix establishes a single request agent during the engine creation, which is shared between all HTTP/HTTPS scenarios.

I wrote a crude test to verify the improvement:

config:
  target: 'http://localhost:8080'
  http:
    timeout: 100
    pool: 100
  phases:
    - duration: 60
      arrivalRate: 1000
scenarios:
  - flow:
    - get:
        url: "/"

Results WITHOUT the fix:

Complete report @ 2017-06-29T12:28:17.784Z
  Scenarios launched:  60000
  Scenarios completed: 44221
  Requests completed:  44221
  RPS sent: 654.16
  Request latency:
    min: 0.6
    max: 2472
    median: 13.1
    p95: 28.6
    p99: 466.6
  Scenario duration:
    min: 0.9
    max: 2473.4
    median: 14
    p95: 30.7
    p99: 467.7
  Scenario counts:
    0: 60000 (100%)
  Codes:
    200: 43293
    404: 456
    500: 472
  Errors:
    EADDRNOTAVAIL: 15779

Results WITH the fix in place:

Complete report @ 2017-06-29T12:30:00.571Z
  Scenarios launched:  60000
  Scenarios completed: 60000
  Requests completed:  60000
  RPS sent: 984.09
  Request latency:
    min: 0.1
    max: 57.3
    median: 10.5
    p95: 19.8
    p99: 27.4
  Scenario duration:
    min: 0.3
    max: 57.6
    median: 10.9
    p95: 20.4
    p99: 28.1
  Scenario counts:
    0: 60000 (100%)
  Codes:
    200: 58750
    404: 656
    500: 594

These tests were ran on a Mid-2015 Macbook Pro (2.2 GHz Intel Core i7, 16GB RAM) with OSX El Capitan as the operating system.

As you can see, the first result produces a considerable amount of errors, while the second results don't produce any errors.

jpallari commented 7 years ago

More insight on how the pool usage is not done successfully prior to this PR.

When the pool parameter is set in the load test configuration, the parameter is passed to the request library. In their documentation, request says the following:

Note that if you are sending multiple requests in a loop and creating multiple new pool objects, maxSockets will not work as intended. To work around this, either use request.defaults with your pool options or create the pool object with the maxSockets property outside of the loop.

Therefore, even when using the pool parameter, each request ends up using its own pool.

It's not obvious how the pool parameter in request should be used. In order to get around it, the fix skips the pool parameter, and uses a custom agent instead.

cornelf commented 7 years ago

👍

hassy commented 7 years ago

Thanks for the PR. I don't have the time to try the patch right now, but I'll give it a go as soon as I can. A couple of comments:

Each scenario creates a new TCP connection by design (by default) to mimic real-world user behavior.
If you re-run your example script with a realistic pool setting, e.g. 100 and run netstat while Artillery is running, you should see the test complete with no errors and the number of open connections at 100 throughout the test even though multiple agents will be created (one per scenario). Your test as-is does not necessarily demonstrate a bug since you're allowing Artillery to open up to 1,000,000 concurrent connections when 64k is the absolute possible maximum.
Try commenting out the pool setting and rerunning the test and you should see the number of open connections shoot up quickly. You might see a too many open files error very quickly as well depending on what ulimit -n reports on your system.

hassy commented 7 years ago

As to correct usage of pool - that particular sentence from their docs is not very clear, however as far as I understand, pool refers to the pool of agents rather than sockets, and each agent is currently only allowed 1 socket, hence the total number of open connections when config.http.pool is set should not exceed that number.

jpallari commented 7 years ago

Thanks for the quick response!

If you re-run your example script with a realistic pool setting, e.g. 100 and run netstat while Artillery is running, you should see the test complete with no errors and the number of open connections at 100 throughout the test even though multiple agents will be created (one per scenario).

Yes, I managed to run the test arrival rate of 100 per second. However, increasing the rate beyond that started producing errors.

Your test as-is does not necessarily demonstrate a bug since you're allowing Artillery to open up to 1,000,000 concurrent connections when 64k is the absolute possible maximum.

Yeah, I didn't think it'd ever reach that limit. I provided a ridiculously huge number in order to allow it to run uncapped. This is similar to the default Node.JS HTTP client behaviour. I should have been more explicit about that.

Each scenario creates a new TCP connection by design (by default) to mimic real-world user behavior.

Ahh, I think that explains quite a bit. I guess my goals are different from Artillery's goals, which is why my approach here is so different. :)

hassy commented 7 years ago

Not an arrival rate of 100 but a pool of 100 to limit the number of max open concurrent connections. Re the 1,000,000 limit: of course you will reach your system's limit quickly if you're opening 1000 connections per second. 64k is the maximum theoretical limit, likely to be much lower on your actual system (check ulimit -n), that's why you're seeing those errors. Setting the pool to 1,000,000 makes no sense when you have one client and one target, both running on the same host.

I just tried your YAML script buth with pool set to 100 and the test run completes and the number of open sockets remains constant at 100.

The reason the pool setting is there is for use-cases like yours, when you don't want a new socket for each scenario. I'm not sure if there's a bug in the current implementation though.

jpallari commented 7 years ago

AFAIK, the pool just sets the upper bound for how many sockets you can have at maximum in a single agent.

It seems the implementation is altered enough to not make the comparison fair. I'll update the test and results later, because I'm continuing the tests on a different machine. Interestingly, I'm seeing better results with another machine.

hassy commented 7 years ago

Yes, pool is the upper bound, but if you set the upper bound to 1 million, you're going to run out of file descriptors or ports on your system very quickly if you're creating 1000 of them every second. Again, the maximum theoretical number of open TCP connections is 64k. Do you see what the problem with setting pool = 1,000,000 is? It's impossible to have 1M open connections from a single client to a single server. You are not in fact testing anything at all with the original test script. There's no difference between setting a value that large and not setting a value at all. Does that make sense?

Edit: I believe pool is the pool of agents as I'd mentioned earlier, and each agent has a maxSockets setting which sets the limit on sockets a single agent can open. The current implementation sets that to always equal 1 and creates an agent per scenario OR lets Request.js manage a pool of agents of some maximum size.

jpallari commented 7 years ago

If you reuse the connections as is done in this PR, you don't necessarily get to see the TCP connections get maxed out.

In the original version, you get a lot of connections established and not reused, which means that there will be a large buildup of scrapped connections in TIME_WAIT state.

hassy commented 7 years ago

The connections ARE being re-used already if you use the config.http.pool setting correctly, that's the entire point of having that setting. I don't know how else to explain that your test case does not test what you seem to think it's testing. Please take the time to re-read my messages. Artillery is behaving exactly as expected given your input and the operating system limits when you set pool = 1,000,000. With pool set to a sane value, e.g. something LESS than the max number of fds on your system or the maximum theoretical number of TCP connections, everything works as expected. Set pool to 100 and you'll have a maximum of 100 connections.

jpallari commented 7 years ago

I updated the test and the results to use only a pool of max 100 sockets. These tests were run in OS X.

Yesterday, I also ran the same tests in Dell XPS 9630 (i5, 8GB) with Fedora Linux 25 as the OS. The original code did not produce the errors reported here, but I could see the connection count rise without stopping. When running the same test with the code from this PR, the connection count did not rise above a certain level.

artilleryio / artillery-core

Establish a single agent for all the scenarios #183