Open JohnMcLear opened 4 years ago
Day 2. I did some reading this morning, SocketIO essentially has a hard limit of 10k connections p/sec
http://drewww.github.io/socket.io-benchmarking/
the max messages-sent-per-second rate is around 9,000–10,000 depending on the concurrency level
I included a value showing we are hitting a SocketIO limit.
All tests done w/ 1Gb ram on a VM w/ 2.7Ghz CPU.
Load Test Metrics -- Target Pad http://192.168.1.48:9001/p/kRBzLAeSfI
Total Clients Connected: 157
Local Clients Connected: 156
Authors Connected: 39
Lurkers Connected: 117
Sent Append messages: 3913
Commits accepted by server: 3812
Commits sent from Server to Client: 393671
Current rate per second of Commits sent from Server to Client: 0
Mean(per second) of # of Commits sent from Server to Client: 1832
Max(per second) of # of Messages (SocketIO has cap of 10k): 10415
Number of commits not yet replied as ACCEPT_COMMIT from server 101
Things to note:
So now we know what we need to know we can start thinking about how Etherpad can be changed to meet these restrictions.
Before I propose changes I think it's worth stating just how little 10k messages per second is. When you have 100 lurkers it's only 10 edits per second. That is nothing. We're testing 39 edits per second with 117 lurkers which is beyond the theoretical limit.
One limit you could say is that max users could be 10k because theoretically if each user was on their own pad writing one edit per second Etherpad can support this. You can probably half this number to 5k users just for sanity.
Now we can say for each pad we should limit total users to 100 (which is super high) but actually, is kinda dumb because we know we can support 1:400...
The best thing might to say "once we're at X amount of messages being sent from SocketIO per second reject new connections". I'd say a safe limit is somewhere in the region of ~5k and this can be modified/adjusted by an admin...
Another option is to look at replacing socketio with ws, which IMHO doesn't really solve the problem as it's probable 5k per socketIO thread is fine.
TODO
On my windows machine
Load Test Metrics -- Target Pad http://127.0.0.1:9001/p/ExUrRM3KNP
Local Clients Connected: 120
Authors Connected: 30
Lurkers Connected: 90
Sent Append messages: 1399
Commits accepted by server: 1383
Commits sent from Server to Client: 164489
Current rate per second of Commits sent from Server to Client: 0
Mean(per second) of # of Commits sent from Server to Client: 8354
Max(per second) of # of Messages (SocketIO has cap of 10k): **22515**
Number of commits not yet replied as ACCEPT_COMMIT from server 16
Seconds test has been running for: 21
This seems to be the maximum I can achieve as far as messages per second on Windows. Interestingly I can accomplish ~1K lurkers to 1 author on this machine.
An update on TODO:
1. Still to do.
I ran three tests. Performance is roughly documented here for easy comparison.
So TLDR; A server can be easily overloaded w/ 500 lurkers & 1 author.
A hyperactive author tries to replicate a really active author pushing 4 characters a second. This is rare in Etherpad *needs citation.
Summary: Etherpad is holding up to ~40 hyperactive authors & ~120 lurkers per pad. At this point things get too slow to really make sense. A safe balance might be ~30 hyperactive authors.
I am going to re-run test #3 because I think something went wrong. After running it again and getting similar results I need to ponder why this is the case..
1. Local client to Local Server (Same VM)
Total = 600k commits Load Test Metrics -- Target Pad http://192.168.1.48:9001/p/tQktzHcrN8chPwyOVab_ Total Clients Connected: 182 Local Clients Connected: 182 Authors Connected: 45 Lurkers Connected: 137 Sent Append messages: 5090 Commits accepted by server: 4989 Commits sent from Server to Client: 592417 Number of commits not yet replied as ACCEPT_COMMIT from server 101
2. Local Client to Remote Server
Total = 300k commits Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/Ya0dkFALmk
Total Clients Connected: 146 Local Clients Connected: 148 Authors Connected: 37 Lurkers Connected: 111 Sent Append messages: 3195 Commits accepted by server: 3094 Commits sent from Server to Client: 287995 Number of commits not yet replied as ACCEPT_COMMIT from server 101
3. 2x Client To Server (One clients local to server)
Total == ~200k commits Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/foo
Total Clients Connected: 135 Local Clients Connected: 71 Authors Connected: 18 Lurkers Connected: 53 Sent Append messages: 793 Commits accepted by server: 692 Commits sent from Server to Client: 57945 Number of commits not yet replied as ACCEPT_COMMIT from server 101
Server --> Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/foo
Clients Connected: 99 Authors Connected: 25 Lurkers Connected: 74 Sent Append messages: 1699 Commits accepted by server: 1598 Commits sent from Server to Client: 137569 Number of commits not yet replied as ACCEPT_COMMIT from server 101
Test 3, run again
Total Revs = 260k Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/foonew *note Server client Clients Connected: 117 Authors Connected: 30 Lurkers Connected: 87 Sent Append messages: 2275 Commits accepted by server: 2174 Commits sent from Server to Client: 204316 Number of commits not yet replied as ACCEPT_COMMIT from server 101
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/foonew
*note john's laptop VM. Total Clients Connected: 141 Local Clients Connected: 72 Authors Connected: 18 Lurkers Connected: 54 Sent Append messages: 807 Commits accepted by server: 706 Commits sent from Server to Client: 65414 Number of commits not yet replied as ACCEPT_COMMIT from server 101
4. Local Client to Remote server - a 1 - l 200
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/bsxHnSvB0u
Total Clients Connected: 151 Local Clients Connected: 201 Authors Connected: 1 Lurkers Connected: 200 Sent Append messages: 197 Commits accepted by server: 196 Commits sent from Server to Client: 39200 Seconds test has been running for: 213
Similar findings @ -l 300, @ -l 400 hitting about 10% CPU.
Changing -l to 500 significantly changes things. Server CPU jumps to %114 and connectivity begins failing.
-l @ 450 has same experience 420 hits 100% CPU but goes back to being "stable" but with a notable lag. Note that this is essentially 420 commits per second in traffic but with other overheads too.
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/rogTaX22Ny
Total Clients Connected: 421 Local Clients Connected: 471 Authors Connected: 1 Lurkers Connected: 470 Sent Append messages: 53 Commits accepted by server: 45 Commits sent from Server to Client: 18453 Number of commits not yet replied as ACCEPT_COMMIT from server 8 Seconds test has been running for: 85
This is an important piece of information because it appears that a theoretical limit / restriction of 1:420 appears to exist.
Profiling @ 1:100
I ran
This created a 16Mb profile dump. I processed this
node --prof-process isolate-0x2cba060-v8.log > processed.txt
which threw up a bunch of errors but also it created a profile report..I decided at this point to switch from dirty to MySQL... I reran the test and dumped the contents then processed it to processed2.txt, during this I also re-ran the 1:420 text, CPU dropped from 100% to 10% with no latency. So the database was the restriction meaning the file system was the restriction but WHY? Because UeberDB is supposed to be caching these values in memory?
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/S1JaAoFBWH
Total Clients Connected: 420 Local Clients Connected: 421 Authors Connected: 1 Lurkers Connected: 420 Sent Append messages: 65 Commits accepted by server: 64 Commits sent from Server to Client: 24148 Seconds test has been running for: 85
New dumps looked like this:
So I guess I should redo all my tests with MySQL enabled... :P
Single device to Server test.
2 re-run w/ MySQL
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/R033xllA4M Total = 218754
Total Clients Connected: 129 Local Clients Connected: 130 Authors Connected: 33 Lurkers Connected: 97 Sent Append messages: 2680 Commits accepted by server: 2579 Commits sent from Server to Client: 218754 Number of commits not yet replied as ACCEPT_COMMIT from server 101
1 Local Client to Local Server w/ MySQL - NODE_ENV == development
TOTAL = 262355 Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/ucJKBQ2MqN
Clients Connected: 130 Authors Connected: 33 Lurkers Connected: 97 Sent Append messages: 3076 Commits accepted by server: 2975 Commits sent from Server to Client: 262355 Number of commits not yet replied as ACCEPT_COMMIT from server 101
1 Local Client to Local Server w/ MySQL - NODE_ENV == production
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/kdN2SMfDy3
Clients Connected: 139 Authors Connected: 35 Lurkers Connected: 104 Sent Append messages: 3319 Commits accepted by server: 3218 Commits sent from Server to Client: 299389 Number of commits not yet replied as ACCEPT_COMMIT from server 10
300k... wtf.
Switched back to Dirty, expecting ~600k commits.
Load Test Metrics -- Target Pad https://embed.etherpad.com:9002/p/dqoSh0sQ7V
Clients Connected: 135 Authors Connected: 33 Lurkers Connected: 102 Sent Append messages: 3336 Commits accepted by server: 3235 Commits sent from Server to Client: 300189 Number of commits not yet replied as ACCEPT_COMMIT from server 101 [root@JohnEtherpad jose]#
No, same, about 300k...
On my 1Gb VM I'm getting ~600k with Redis as backend..
Running same test with dirty..
550k & 512k with dirty. So avg of 530k.. Poss due to new DB?
Switching back to redis.. Redis on a new DB.
Redis reporting ~500k, so no change.. Dirty doesn't appear to be restricting..
I'm scratching my head a bit, might need to sleep on this.
Running the client on my laptop (not in a VM - windows 10 node 12.6 pointing it at the 1Gb VM) I get.
Load Test Metrics -- Target Pad http://192.168.1.48:9001/p/PL7L3fuhDO
Local Clients Connected: 132 Authors Connected: 33 Lurkers Connected: 99 Sent Append messages: 2787 Commits accepted by server: 2686 Commits sent from Server to Client: 233141 Number of commits not yet replied as ACCEPT_COMMIT from server 101
So is the client the limiting factor here? I'm going to target two users at the same pad..
I'm taking a break for today, it's been a bit of a confusing set of numbers but overall things look fairly okay except that a single client if not rate limited can do a DoS by simulating 400 lurkers... 1.9 will implement rateLimiting so I guess now we can figure out how many msgs per second seems fair per IP address?