decred / dcrdata

Decred block explorer, with packages and apps for data collection and storage. Written in Go.
https://dcrdata.decred.org/
ISC License
129 stars 128 forks source link

stress testing the APIs and WebSocket hub #913

Open chappjc opened 5 years ago

chappjc commented 5 years ago

This complements the Insight socket.io tests in https://github.com/decred/dcrdata/issues/912.

To support many non-browser client connections, stress testing needs to be performed on the following components:

  1. The dcrdata API HTTP endpoints
  2. The Insight API HTTP endpoints
  3. The Insight WebSocket system (socket.io)

This issue refers to 1. and 2., while https://github.com/decred/dcrdata/issues/912 refers to 3.

Since @buck54321 is interested in this, the issue is tentatively assigned to him. The first step should be to get a proof-of-concept stress test client running that the devs can hack on.

Some ideas from a Slack disussion:

buck54321 [4:57 AM] A topic that seems to be on people's minds is stress testing. https://github.com/decred/dcrdata/issues/60.

chappjc [5:00 AM] We'll likely need to look beyond go test to do proper stress testing There's an Apache tool I start looking into, but we'll can also make a go http client and client pool on our own A standalone tool to start N connections per second Similar for websocket testing, but N persistent connections That's just my thought on the matter though. buck54321 [5:04 AM] Do you cycle all endpoints or focus on certain ones? chappjc [5:05 AM] I figure that would be a parameter of the testing tool There could be a couple different predefined cycles Or certain endpoints could be tested Perhaps a json file could list endpoints to hit My thought was to gather statistic on response times and response codes
szpasztor commented 5 years ago

I've done a couple stress tests using locust for REST APIs. With locust, endpoints are configured in a python script and the tool can be run locally or even in a master/slave setup (I think I used one m4.large EC2 instance as the master and several r5.large or Heroku slaves depending on use case for real world tests). The larger tests were simulating ~400K simultaneous logged-in users interacting every couple seconds with a statically defined endpoint list.

I'm not sure what test size is needed here, but the bottleneck I'd eventually run into beyond 500K users was RAM usage as locust creates individual threads/processes for each user - hence the memory optimized EC2 instances.

Locust doesn't natively support websocket as far as I know, but there seem to be solutions to do it by changing the source.

I can take this issue if needed.

chappjc commented 5 years ago

Partially resolved by https://github.com/decred/dcrdata/pull/924, although the websocket stress test is needed.