artilleryio / artillery

The complete load testing platform. Everything you need for production-grade load tests. Serverless & distributed. Load test with Playwright. Load test HTTP APIs, GraphQL, WebSocket, and more. Use any Node.js module.
https://www.artillery.io
Mozilla Public License 2.0
8.03k stars 510 forks source link

WS Connections Fluctuate too Wildly #286

Open phantomvivek opened 7 years ago

phantomvivek commented 7 years ago

Hello,

I've been using Artillery for load testing my websocket app and I have noticed strange behaviour related to heavy load. The connections made to the application fluctuate too wildly.

Sample Artillery Log:

Report for the previous 10s @ 2017-03-30T12:55:37.198Z
  Scenarios launched:  5284
  Scenarios completed: 5860
  Requests completed:  9520
  Concurrent users:    28159
  RPS sent: 848.48
  ...

Report for the previous 10s @ 2017-03-30T12:55:47.254Z
  Scenarios launched:  4898
  Scenarios completed: 4669
  Requests completed:  71
  Concurrent users:    29896
  RPS sent: 6.82
  ...

Report for the previous 10s @ 2017-03-30T12:55:57.274Z
  Scenarios launched:  5098
  Scenarios completed: 3283
  Requests completed:  3962
  Concurrent users:    32438
  RPS sent: 363.15
  ...

Report for the previous 10s @ 2017-03-30T12:56:07.302Z
  Scenarios launched:  5052
  Scenarios completed: 1674
  Requests completed:  1612
  Concurrent users:    33581
  RPS sent: 150.23
  ...

Report for the previous 10s @ 2017-03-30T12:56:17.316Z
  Scenarios launched:  2418
  Scenarios completed: 2839
  Requests completed:  573
  Concurrent users:    34067
  RPS sent: 53.85
  ...

Report for the previous 10s @ 2017-03-30T12:56:27.342Z
  Scenarios launched:  2922
  Scenarios completed: 10463
  Requests completed:  7026
  Concurrent users:    31390
  RPS sent: 470.28
  ...

Report for the previous 10s @ 2017-03-30T12:56:37.425Z
  Scenarios launched:  6724
  Scenarios completed: 150
  Requests completed:  9447
  Concurrent users:    32193
  RPS sent: 806.75
  ...

Sample configuration:

{
  "config": {
      "target": "ws://IP:PORT",
      "phases": [
        {"duration": 120, "arrivalRate": 1, "rampTo":350},
        {"duration": 300, "arrivalRate": 350}
      ],
      "ws": {
        "rejectUnauthorized": false
      }
  },
  "scenarios": [
    {
      "engine": "ws",
      "flow": [
        {"send": "{}",
        {"think": 50}
      ]
    }
  ]
}

This seems like weird behaviour to me. Can you please tell me if I am doing something incorrectly? I am running two instances of artillery on different machines and hitting the ws app.

The ws app does nothing with the socket connection, it receives the connection and keeps the connection alive until Artillery closes it.

hassy commented 7 years ago

Is the application able to accept connections at the rate that Artillery creates them?

phantomvivek commented 7 years ago

Yes, the application is able to accept them. I was testing out the new uws module to replace the existing ws module (might I suggest that would be a valuable addition to Artillery as well). I think this could have something to do with the OS limits, since after increasing them, the fluctuations occur but not with that big a gap as seen above. I shall post the results of those in a few hours.

hassy commented 7 years ago

You should see EMFILE errors if the OS is running out of file descriptors. What's the CPU usage of Artillery's processes when the test is running? Artillery itself could be maxing out on the given hardware.

phantomvivek commented 7 years ago

I think this has something to do with the OS socket and file limits since after increasing them the test was able to kind of stabilize the RPS. The configuration used was the same as above except the rampTo and arrivalRate changed to 500 on both servers. (which is still a bit weird since none of the logs suggest an RPS very close to 500).

I have also noticed this that if the application being tested is doing anything after accepting the incoming connection (like making a couple of HTTP calls) the requests start to fluctuate. I understand that this is a problem with the application itself (it may not be able to accept connections or would be making artillery wait), but can this be handled or reported in a manner that would give out a warning? Something of the sort that says the ws connection took too long to connect if the connection could not be established in X seconds.. just thinking out loud here. Since increasing OS limits to identify issues could be misleading.

Artillery log:

Report for the previous 10s @ 2017-03-30T14:01:10.871Z
  Scenarios launched:  5031
  Scenarios completed: 3700
  Requests completed:  4627
  Concurrent users:    25130
  RPS sent: 428.43
  ...

Report for the previous 10s @ 2017-03-30T14:01:20.902Z
  Scenarios launched:  4754
  Scenarios completed: 5054
  Requests completed:  4692
  Concurrent users:    25389
  RPS sent: 443.48
  ...

Report for the previous 10s @ 2017-03-30T14:01:30.931Z
  Scenarios launched:  4930
  Scenarios completed: 4387
  Requests completed:  5157
  Concurrent users:    25441
  RPS sent: 488.82
  ...

Report for the previous 10s @ 2017-03-30T14:01:40.958Z
  Scenarios launched:  5284
  Scenarios completed: 5562
  Requests completed:  4949
  Concurrent users:    25491
  RPS sent: 432.6
  ...

Report for the previous 10s @ 2017-03-30T14:01:50.993Z
  Scenarios launched:  5357
  Scenarios completed: 4647
  Requests completed:  4668
  Concurrent users:    25805
  RPS sent: 458.1
  ...

Report for the previous 10s @ 2017-03-30T14:02:01.028Z
  Scenarios launched:  4517
  Scenarios completed: 4620
  Requests completed:  4806
  Concurrent users:    26066
  RPS sent: 469.34
  ...

Report for the previous 10s @ 2017-03-30T14:02:11.062Z
  Scenarios launched:  5155
  Scenarios completed: 4692
  Requests completed:  4254
  Concurrent users:    26165
  RPS sent: 379.14
  ...

Report for the previous 10s @ 2017-03-30T14:02:21.121Z
  Scenarios launched:  4766
  Scenarios completed: 5160
  Requests completed:  4758
  Concurrent users:    26162
  RPS sent: 433.33
  ...

Report for the previous 10s @ 2017-03-30T14:02:31.160Z
  Scenarios launched:  5238
  Scenarios completed: 4946
  Requests completed:  4415
  Concurrent users:    26488
  RPS sent: 384.58
  ...

Report for the previous 10s @ 2017-03-30T14:02:41.193Z
  Scenarios launched:  4540
  Scenarios completed: 4405
  Requests completed:  4388
  Concurrent users:    26570
  RPS sent: 397.1
  ...

Report for the previous 10s @ 2017-03-30T14:02:51.240Z
  Scenarios launched:  5699
  Scenarios completed: 5086
  Requests completed:  4646
  Concurrent users:    26811
  RPS sent: 414.82
  ...

Report for the previous 10s @ 2017-03-30T14:03:01.277Z
  Scenarios launched:  4118
  Scenarios completed: 4125
  Requests completed:  4665
  Concurrent users:    27168
  RPS sent: 438.03
  ...

Report for the previous 10s @ 2017-03-30T14:03:11.314Z
  Scenarios launched:  4163
  Scenarios completed: 5246
  Requests completed:  4537
  Concurrent users:    26093
  RPS sent: 389.11
  ...

Report for the previous 10s @ 2017-03-30T14:03:21.353Z
  Scenarios launched:  4750
  Scenarios completed: 4088
  Requests completed:  8194
  Concurrent users:    28552
  RPS sent: 816.95
  ...

Report for the previous 10s @ 2017-03-30T14:03:31.393Z
  Scenarios launched:  1476
  Scenarios completed: 4541
  Requests completed:  1478
  Concurrent users:    27685
  RPS sent: 137.87

..Now a lot of records with NaN as their RPS..

Final one once the test completed:
Complete report @ 2017-03-30T14:04:21.769Z
  Scenarios launched:  120226
  Scenarios completed: 120226
  Requests completed:  120226
  RPS sent: 307.87
ghost commented 7 years ago

I was testing out the new uws module to replace the existing ws module

Please do not test uws using Artillery.io. It is unreliable and reports invalid results due to choking itself to death at over 225% CPU time while the test subject is at 0% CPU time. It reports invalid latency results due to having itself stutter through GC freezes all the time, which further taints the results. In fact I cannot even stress ws fully with Artillery.io. Something is seriously wrong in Artillery.io, use something better.

phantomvivek commented 7 years ago

Hey @alexhultman , thanks for the heads up. We've been looking at a solution to load test our node websocket apps. Do you have any suggestions? We did try jMeter but it has its own sets of problems and is not very friendly to set up quick tests and modifications.

hassy commented 7 years ago

You could try reducing the arrivalRate and use a loop instead to send more messages over the same connection. Your original example opens 350 new TCP connections per second for 5 minutes in the second phase (with 17.5k connections being open at once after the first 50 seconds). At the moment you're testing the networking stack more that the actual application. Another option is to use more instances of Artillery to generate load on the target application.

(There's LOTS of room for optimising Artillery's performance on a single node but that's not been a huge priority since it's good enough for most use cases and it's so easy to run a distributed test from eg. AWS EC2 these days.)