babl-ws / babl

Low-latency WebSocket Server
https://babl.ws
Apache License 2.0
68 stars 22 forks source link

Performance degrades when broadcast payload size increases #110

Closed ccnlui closed 2 years ago

ccnlui commented 2 years ago

Issue: I'm trying to understand why performance drops so much when I increased broadcast payload size from 30 bytes to 100 bytes.

The babl server is very simple: it polls an external aeron subscription in the additionWork agent (applicationWork pattern), broadcasts every message that it receives to 1 topic. There is 1 websocket client listening on the topic.

Why I think its babl server? I think I'm misconfiguring babl, but I'm not sure.

I tried testing with the same websocket client, while using another websocket library like java-websocket. And I was able to measure ~1ms latency with 30 bytes and 100 bytes payload.

I also tried testing the same babl server and websocket client on two different servers on google cloud. Very similar results...

Test results below, I'd appreciate any feedback. Thank you.

Benchmark setup: 1 websocket client 15000 msg / sec broadcast every message to 1 topic

Results - payload size: 30 bytes

2022/05/18 20:47:36 warm up 20 seconds...
trades: 5000 quotes: 10000 quotes latency us (p50 p95 p99): 1088 1140 1187 trades latency us (p50 p95 p99): 1084 1136 1194
trades: 5000 quotes: 10001 quotes latency us (p50 p95 p99): 1088 1140 1183 trades latency us (p50 p95 p99): 1084 1136 1189
trades: 5001 quotes: 10001 quotes latency us (p50 p95 p99): 1088 1142 1350 trades latency us (p50 p95 p99): 1084 1137 1355
(...)

Results payload size: 100 bytes

2022/05/18 20:52:43 warm up 20 seconds...
trades: 4995 quotes: 9990 quotes latency us (p50 p95 p99): 12703 22559 23439 trades latency us (p50 p95 p99): 12671 22527 23455
trades: 4966 quotes: 9932 quotes latency us (p50 p95 p99): 12703 22559 23439 trades latency us (p50 p95 p99): 12671 22527 23455
trades: 4968 quotes: 9936 quotes latency us (p50 p95 p99): 12695 22559 23439 trades latency us (p50 p95 p99): 12663 22527 23439
(...)

babl properties:

babl.server.bind.address=0.0.0.0
babl.server.connection.backlog=20
babl.server.deployment.mode=DETACHED
babl.server.directory=/dev/shm/babl
babl.server.instances=1
babl.server.listen.port=8080
babl.server.poll.mode.enabled=false
babl.server.poll.mode.session.limit=5
babl.server.session.monitoring.entry.count=4096
babl.server.session.poll.limit=200
babl.server.validation.timeout=10000000000
babl.server.validation.validator=com.aitusoftware.babl.websocket.AlwaysValidConnectionValidator
babl.performance.mode=HIGH
babl.session.buffer.decode.max.size=131072
babl.session.buffer.decode.size=1024
babl.session.buffer.max.size=16777216
babl.session.buffer.receive.size=1024
babl.session.buffer.send.size=65536
babl.session.frame.max.size=65536
babl.session.ping.interval=5000000000
babl.session.pong.response.timeout=30000000000
babl.socket.receive.buffer.size=65536
babl.socket.send.buffer.size=65536
babl.socket.tcpNoDelay.enabled=false
babl.proxy.application.adapter.poll.limit=200
babl.proxy.application.stream.base.id=5000
babl.proxy.back.pressure.policy=CLOSE_SESSION
babl.proxy.driver.dir=/dev/shm/aeron
babl.proxy.driver.launch=false
babl.proxy.server.adapter.poll.limit=150
babl.proxy.server.stream.base.id=6000
babl.server.idle.strategy=BACK_OFF
babl.application.idle.strategy=BUSY_SPIN
ccnlui commented 2 years ago

I did some more testing on google cloud.

My theory is that babl as a library consumes more resources than other library like java-websocket. As a result, it reaches the system limit faster as payload size increases.

epickrram commented 2 years ago

Hi @ccnlui Without knowing your test setup, and the machines that you're running on, it's almost impossible to say what the cause is. Babl certainly does use more system resources than other websocket libraries, but that is why it is much faster.

As with your previous issue in the benchmark repository, you need to make sure that the systems you are running tests on have adequate resources to execute the program.

Even using simple tools such as top should tell you whether you are maxing out the CPU on a machine. If you are, then you cannot expect to get reliable, repeatable results. Especially in the cloud, where misbehaving instances may be throttled, further degrading system performance.

That said, Babl attempts to expose useful metrics during its operation. Please refer to the Monitoring section of the documentation:

https://babl.ws/monitoring.html

All classes ending in Printer in the com.aitusoftware.babl.monitoring package will dump various counters to stdout. These may give you some insight into where the problem lies.

Please also spend some time reading the Configuration section:

https://babl.ws/configuration.html

to understand what your configuration file is actually doing.

ccnlui commented 2 years ago

Thank you @epickrram. I'm fairly certain I'm getting the above results because the system was running out of resources (local machine, not very powerful cloud instances, ...). I tried again on some dedicated, more powerful cloud instances, and I was able to measure consistent performance, ~1ms in both cases.

I'm still exploring the limit of this library, so thank you for your advice. I'll dig into the rest of the documentation.

soylomass commented 2 years ago

@epickrram "Babl certainly does use more system resources than other websocket libraries, but that is why it is much faster"

What resources does babl use more than other ws libraries? CPU or memory?

I'm evaluating babl as a replacement for Undertow as the WebSocket server of a browser-based game, what do you think?

Thanks in advance

epickrram commented 2 years ago

Hi @soylomass babl achieves very low latency by burning CPU time (i.e. busy-spinning poll loops). You can configure different idle-strategies to trade off CPU vs. latency. @isolgpus has built a browser-based game using babl as the back-end, so may have more useful input (I originally designed the library for trading systems, which have slightly different operating parameters).

isolgpus commented 2 years ago

@soylomass While our conditions may differ I can say switching to babl was probably the best single performance improvement I ever got.

Some things you may want to take into account though -

soylomass commented 2 years ago

Thanks for your helpful comments! I'll give it a try