avito-tech / bioyino

High performance and high-precision multithreaded StatsD server
The Unlicense
228 stars 22 forks source link

Question about settings for Best Performance #74

Closed syatihoko closed 9 months ago

syatihoko commented 1 year ago

Hello. We started using multimessage = true Are there any best practices from you on the settings of this block?

Until today, there were such settings (changed to try to fix the docker reboot that appeared)

[network]
listen = "0.0.0.0:8126"
peer-listen = "0.0.0.0:8136"
mgmt-listen = "0.0.0.0:8137"
bufsize = 50000
multimessage = true
mm-packets = 100
mm-async = false
buffer-flush-time = 5000
buffer-flush-length = 255536
greens = 7
async-sockets = 7
nodes = []
snapshot-interval = 1000

I took from the example in the repository:

[network]
listen = "0.0.0.0:8126"
peer-listen = "0.0.0.0:8136"
mgmt-listen = "0.0.0.0:8137"
bufsize = 1500
multimessage = true
mm-packets = 1000
mm-async = false
buffer-flush-time = 10000
buffer-flush-length = 655360
greens = 7
async-sockets = 7
nodes = []
snapshot-interval = 1000

The server has 8 cores:
n-threads = 7
w-threads = 7

Currently working in docker. Single Node. Previously, the processor was 100%. Now 50% or lower.

The parse-error metric shows approximately 3000 (the developer is working on correcting the sending) ingress-metric about 5M. And will increase

It is unknown or because of the increase in the number of metrics, it is possible that sometimes the docker with the service is overloaded (I think for stability to take out of the docker and make a cluster in the future)

Can you tell me in such a situation what parameters of bioyino can be improved? Or how to change them if you switch to a cluster of 3 nodes.

Albibek commented 1 year ago

It's hard to recommend you something without experimenting. Multimessage is an OS-side buffering. If your metric rate is low, you risk them getting taken out from this buffer too late and getting into the incorrect period. If it is high, you want metrics processed in batches (which is faster), but you don't want the batches to be too big as well. If the batch is too big, you start losing memory for buffering and again, delaying metrics. For our high-rate loads in Avito it was enough to use mm-packets=1000. In 3-node cluster configuration if you distribute metrics evenly between nodes, each node will just get 3x less rate, so you may want to reduce the mm-packets if you see the rate is not enough for most of the metrics to be delivered in time.

syatihoko commented 1 year ago

Thanks for the detailed answer. Is it correct that the correspondence here is in English? Although I assume we have the same mother tongue =)

Tell me, if it’s not a secret, is some kind of balancer used in front of the Bioino cluster (which one would you recommend), or different groups of services had different destination node.

Albibek commented 1 year ago

Is it correct that the correspondence here is in English? Although I assume we have the same mother tongue =)

Yes, I speak Russian, but I'd prefer to leave conversations here in English for the sake of other readers.

Tell me, if it’s not a secret, is some kind of balancer used in front of the Bioino cluster (which one would you recommend), or different groups of services had different destination node.

For some time we were balancing on UDP-level, the hardware balancer distributed packets without any knowledge about metrics. In the end this consumed the ingress bandwidth of the balancer and we switched to agent-based mode. For example in k8s each node has its own agent and all the metrics are received by this agent and then delivered centrally to a cluster.