Closed amaltaro closed 4 months ago
For tests below I used common CouchDB version obtained from their official site and docker hub. The version is 3.3.3
curl -X PUT http://admin:password@127.0.0.1:5984/_users
curl -X PUT http://admin:password@127.0.0.1:5984/_replicator
curl -X PUT http://admin:password@127.0.0.1:5984/_global_changes
curl -X PUT http://admin:password@127.0.0.1:5984/test
# inject first document
curl -X POST http://admin:password@127.0.0.1:5984/test -H "Content-Type: application/json" -d@/wma/vk/CouchDB/wm.json
{"ok":true,"id":"e05f6d85f7fe72b02709decb2a000c58","rev":"1-be003cb86351088da81258cd77a55ea4"}
hey -n 200 -c 50 -m POST -H "Content-Type: application/json" -D /path/wm.json -disable-keepalive /afs/cern.ch/user/v/valya/public/hey_linux -n 200 -c 50 -m POST -H "Content-Type: application/json" -D /wma/vk/CouchDB/wm.json -disable-keepalive -disable-compression http://admin:password@localhost:5984/test 2>&1 1>& log
Results are the following:
69 requests done. 182 requests done. All requests done.
Summary: Total: 1.0729 secs Slowest: 0.4308 secs Fastest: 0.0258 secs Average: 0.2296 secs Requests/sec: 186.4051 Total data: 19000 bytes Size/request: 95 bytes
Status code distribution: [201] 200 responses
Response time histogram: 0.026 [1] |∎ 0.066 [31] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.107 [0] | 0.147 [12] |∎∎∎∎∎∎∎∎∎∎∎ 0.188 [8] |∎∎∎∎∎∎∎ 0.228 [44] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.269 [36] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.309 [17] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.350 [17] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.390 [27] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.431 [7] |∎∎∎∎∎∎
Latency distribution: 10% in 0.0424 secs 25% in 0.1867 secs 50% in 0.2357 secs 75% in 0.3330 secs 90% in 0.3828 secs 95% in 0.3859 secs 99% in 0.4307 secs
### Local CouchDB setup
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://couchdb.apache.org/repo/couchdb.repo sudo dnf config-manager --set-enabled crb sudo dnf install epel-release epel-next-release sudo yum install -y mozjs78 sudo yum install -y couchdb
sudo vim /opt/couchdb/etc/local.ini
sudo -i -u couchdb /opt/couchdb/bin/couchdb
At this step I performed `CouchDB initialization` steps to setup _users and other dbs.
Finally, I repeat steps listed in `Load/stress tests` and got the following results:
All requests done.
Summary: Total: 0.0543 secs Slowest: 0.0227 secs Fastest: 0.0045 secs Average: 0.0123 secs Requests/sec: 3683.9033 Total data: 19000 bytes Size/request: 95 bytes
Status code distribution: [201] 200 responses
Response time histogram: 0.004 [1] |∎ 0.006 [9] |∎∎∎∎∎∎∎∎∎ 0.008 [20] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.010 [32] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.012 [37] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.014 [18] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.015 [42] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.017 [28] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.019 [2] |∎∎ 0.021 [3] |∎∎∎ 0.023 [8] |∎∎∎∎∎∎∎∎
Latency distribution: 10% in 0.0077 secs 25% in 0.0093 secs 50% in 0.0118 secs 75% in 0.0151 secs 90% in 0.0163 secs 95% in 0.0206 secs 99% in 0.0225 secs
### Results comparison
How it can be see from above benchmark results we have the following:
- local CouchDB
- almost 3700 request per second for POST HTTP calls
- docker CouchDB
- below 200 request per second for POST HTTP calls
### TODO
- setup both instances, i.e. local CouchDB and docker CouchDB and run tests outside of the node, i.e. run hey tool from another node to add more network latency
- try different http benchmarks, i.e. POST and GET
- change number of concurrent calls
### References
[1] https://hub.docker.com/_/couchdb/
[2] https://github.com/rakyll/hey or
use vkuznet port patched to support X509: https://github.com/vkuznet/hey
Scary results! However, I would highly recommend to use the products that we will actually be using, instead of using upstream ones.
That said, I would suggest to test the current COMP RPM couchdb package against the wmagent-couchdb in CMSKubernetes. Ideally these tests should be performed in the very same environment as well (including the node). Otherwise it is a hard comparison to digest.
I setup CouchDB (3.2.2) on one of my VM using COMP RPMs, in fact I simply used rsync
to copy /data/srv
area from one of the WMAgent CERN nodes. Then, I rerun the hey
test against local CouchDB. The results was 2318 req/sec. Then, using wmagent-couchdb
image I got similar results 2379 req/sec.
To sum up:
At this step, I do not know if slowness of stock CouchDB image (3.3.3) is due to running it on RH9 or because of image content itself. But I'm happy to see comparable performance on CC7 using local CouchDB installed via RPM and our wmagent-couchdb docker image.
Another test performed with wmagent-couchdb image on RH9 node:
# CouchDB server
docker run -it -e COUCHDB_USER=admin -e COUCHDB_PASSWORD=password -p 5984:5984 --volume /wma/vk/CouchDB/data:/opt/couchdb/data -v /wma/vk/secrets:/data/admin/wmagent registry.cern.ch/cmsweb/wmagent-couchdb
/afs/cern.ch/user/v/valya/public/hey_linux -n 200 -c 50 -m POST -H "Content-Type: application/json" -D /wma/vk/CouchDB/wm.json -disable-keepalive -disable-compression http://admin:password@localhost:5984/test
Results (3 iteration of hey client):
- 1282 req/sec
- 1620 req/sec
- 1640 req/sec
Average: 1514 req/sec
2. docker image with host network (use `--host=net` option):
docker run --host=net -it -e COUCHDB_USER=admin -e COUCHDB_PASSWORD=password -p 5984:5984 --volume /wma/vk/CouchDB/data:/opt/couchdb/data -v /wma/vk/secrets:/data/admin/wmagent registry.cern.ch/cmsweb/wmagent-couchdb
/afs/cern.ch/user/v/valya/public/hey_linux -n 200 -c 50 -m POST -H "Content-Type: application/json" -D /wma/vk/CouchDB/wm.json -disable-keepalive -disable-compression http://admin:password@localhost:5984/test
Results (3 iteration of hey client):
- 1657 req/sec
- 1953 req/sec
- 1832 req/sec
Average: 1814 req/sec
### Summary
There is a slight degradation of performance (based on HTTP POST requests) between container network and using host network where the later provides more throughput. But in both cases the performance is quite decent, above 1500 req/sec, and I doubt the difference (around 300 req/sec higher in case of host network) will make any impact on WM operations.
@amaltaro do you have any other suggestions for testing based on providing results?
That is very good, thanks Valentin! Do I understand it right that you tested both GET and POST calls to CouchDB?
I think it's important to keep track of this evaluation and the results in our wmcore-docs repository. Please also be explicitly with the:
@vkuznet @vkuznet I would suggest keeping an eye on the progress of the issue #11635 to check if it has an impact on the current tests documented in this issue
Here is another summary in table format, concurrency -n 200 -c 50
means 200 requests using 50 clients:
iteration | Couch setup | Linux OS | deployment | Test method | concurrency | Req/sec |
---|---|---|---|---|---|---|
round 1 | No host | RH9 | image | POST | -n 200 -c 50 | 1198 |
round 1 | No host | RH9 | image | GET | -n 200 -c 50 | 1601 |
round 2 | No host | RH9 | image | POST | -n 200 -c 50 | 858 |
round 2 | No host | RH9 | image | GET | -n 200 -c 50 | 1688 |
round 3 | No host | RH9 | image | POST | -n 200 -c 50 | 1019 |
round 3 | No host | RH9 | image | GET | -n 200 -c 50 | 1751 |
--- | --- | --- | --- | --- | --- | --- |
average | No host | RH9 | image | POST | -n 200 -c 50 | 1025 |
average | No host | RH9 | image | GET | -n 200 -c 50 | 1680 |
--- | --- | --- | --- | --- | --- | --- |
round 1 | host network | RH9 | image | POST | -n 200 -c 50 | 1636 |
round 1 | host network | RH9 | image | GET | -n 200 -c 50 | 3013 |
round 2 | host network | RH9 | image | POST | -n 200 -c 50 | 2355 |
round 2 | host network | RH9 | image | GET | -n 200 -c 50 | 3597 |
round 3 | host network | RH9 | image | POST | -n 200 -c 50 | 2390 |
round 3 | host network | RH9 | image | GET | -n 200 -c 50 | 2908 |
--- | --- | --- | --- | --- | --- | --- |
average | host network | RH9 | image | POST | -n 200 -c 50 | 2127 |
average | host network | RH9 | image | GET | -n 200 -c 50 | 3172 |
iteration | Couch setup | Linux OS | deployment | Test method | concurrency | Req/sec |
---|---|---|---|---|---|---|
round 1 | host | CC7 | RPM | POST | -n 200 -c 50 | 2131 |
round 1 | host | CC7 | RPM | GET | -n 200 -c 50 | 2545 |
round 2 | host | CC7 | RPM | POST | -n 200 -c 50 | 2246 |
round 2 | host | CC7 | RPM | GET | -n 200 -c 50 | 3343 |
round 3 | host | CC7 | RPM | POST | -n 200 -c 50 | 2715 |
round 3 | host | CC7 | RPM | GET | -n 200 -c 50 | 3593 |
--- | --- | --- | --- | --- | --- | --- |
average | No host | CC7 | RPM | POST | -n 200 -c 50 | 2364 |
average | No host | CC7 | RPM | GET | -n 200 -c 50 | 3160 |
Based on provided results we see little difference in average numbers between RPM and docker image using host network deployment. But using docker image without host network degrades performance of both POST and TEST request by factor of 2 on RH9 host.
Here is a shell script used to generate all tests above:
#!/bin/bash
curl -X DELETE http://login:password@127.0.0.1:5984/test
curl -X PUT http://login:password@127.0.0.1:5984/test
file=/afs/cern.ch/user/v/valya/public/wm.json
# insert one document and get its document id
did=`curl -s -X POST http://login:password@127.0.0.1:5984/test -H "Content-Type: application/json" -d@$file | jq '.id'`
# perform POST tests
echo "POST test"
/afs/cern.ch/user/v/valya/public/hey_linux \
-n 200 -c 50 -m POST \
-H "Content-Type: application/json" \
-D $file \
-disable-keepalive \
-disable-compression \
http://login:password@localhost:5984/test 2>&1 1>& couch-test-post.log
grep "Requests/sec" couch-test-post.log
# perform get tests
echo ""
echo "GET test with id=$did"
/afs/cern.ch/user/v/valya/public/hey_linux \
-n 200 -c 50 -m GET \
-disable-keepalive \
-disable-compression \
"http://login:password@localhost:5984/test/$did" 2>&1 1>& couch-test-get.log
grep "Requests/sec" couch-test-get.log
Alan, I addressed in a table above all your requests, do you need anything else here?
Thanks Valentin.
As mentioned in this comment: https://github.com/dmwm/WMCore/issues/11567#issuecomment-2058072321
let us know persist it in the official wmcore-docs
documentation not to get this precious information lost in github tickets.
Done, please see https://gitlab.cern.ch/dmwm/wmcore-docs/-/merge_requests/30
From the table above, performance difference is close to 0 for GET requests, while RPM based has around 10% speed up for POST requests. My conclusion is that containerized CouchDB will have no performance impact. Thank you for the documentation and evaluation, Valentin. Closing this out.
Impact of the new feature WMAgent
Is your feature request related to a problem? Please describe. As part of running WMAgent in a container environment, composed with database containers as well. We need to perform load/stress tests to evaluate the performance of CouchDB container.
Describe the solution you'd like Come up with a reliable and meaningful setup to evaluate the performance (latency and throughput, etc) of CouchDB in two deployment modes:
To be provided with this issue:
Describe alternatives you've considered None
Additional context Depends on: https://github.com/dmwm/WMCore/issues/11312 Part of the following meta issue: https://github.com/dmwm/WMCore/issues/11314