Zelldon commented 4 years ago

Description

The MAX_BATCH_SIZE is used to limit how many data is sent as a batch to a follower. Currently it is static and set to 32 kb. It is likely that this configure can help to improve the throughput. We should evaluate and benchmark different values to get an better understand how it impacts the performance and what would be a good value.

Ideally we add new metrics which show how many data is sent per append, to see whether the max batch size is really used or we have other limits already.

Blocked by https://github.com/zeebe-io/zeebe/issues/5472

Zelldon commented 1 year ago

maybe also something we want to take a look at the hackweek

megglos commented 1 year ago

relates to https://github.com/camunda/product-hub/issues/642 https://github.com/camunda/product-hub/issues/643

megglos commented 1 year ago

It makes sense but we would only optimize for a certain setup e.g. our benchmark, this doesn't necessarily benefit users. User may still need to optimize for their setup.

megglos commented 1 year ago

@falko do you have any input on this based on your experiences?

entangled90 commented 2 weeks ago

Default benchmark

One round of benchmark done with the default 32 kb payload @ 150 PI/S, grafana link for all scenarios: Dashboard

Configurations timeline

max_appends=6, max_batch_size=64kb

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=64kb --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark

[X] started at 16:13
[X] even partition distribution from 16:13
[x] stopped at 16:36

max_appends=6, max_batch_size=128kb

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=128KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark

[x] started at 16:38
[X] even partition distribution from 16:45
[x] stopped at 17:00

max_appends=6, max_batch_size=16kb

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=16KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark

[x] started at 17:01
[X] even partition distribution from 17:16
[x] stopped at 17:27

DEFAULT max_appends=6, max_batch_size=32kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=32KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark

[X] started at 17:30
[X] even partition distribution from 17:30
[X] stopped at 17:45

Results

Backpressure and processing queue size improves with bigger batch sizes. All other metrics look stable (memory & GC as well). Messaging latency degrades a bit, but that's expected considering we are sending payloads 2x-4x bigger.

Smaller batch size (16KB) (even though the payload in this scenario 32KB) is the worst of all on many metric, even though I expected to behave similarly to 32KB because of the payload size.

It may be fruitful to run the realistic benchmark as well with the configuration (32KB, 64KB, 128KB) as well.

Based on this benchmark configuration I would suggest to bump the default to 64KB or 128KB.

entangled90 commented 2 weeks ago

Realistic benchmark scenario

Grafana

Configuration timeline

Started on 28/10/2025 at 09:50:

helm install --create-namespace --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

DEFAULT max_appends=6, max_batch_size=32kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=32KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 09:55
[X] even partition distribution from 10:01
[X] stopped at 10:12

max_appends=6, max_batch_size=64kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=64KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 10:17
[X] even partition distribution from 10:45 (took a lot!)
[X] stopped at 11:00

max_appends=6, max_batch_size=128kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendsPerFollower=6 --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=128KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 11:00
[X] even partition distribution from 11:05
[X] stopped at 11:31

Results

In this scenario as well bigger batch sizes seem to improve "Backpressure" & "Processing queue size". It also seems that more processes are completed (even if it's quite noisy, so not sure if it's real):

Based on this benchmark configuration I would suggest to bump the default to 64KB or 128KB.

entangled90 commented 2 weeks ago

The above benchmark were run before this fix was added to the helm-charts: https://github.com/zeebe-io/benchmark-helm/pull/202, so it's probably better to rerun them.

entangled90 commented 2 weeks ago

REALISTIC SETUP run nr. 2

Started on 30/10/2025 at 16:25: Dashboard

helm install --create-namespace --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

NOTE: maxAppendsPerFollower is not needed anymore, as it's now the default value.

DEFAULT max_appends=6, max_batch_size=32kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=32KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 16:25
[X] even partition distribution from 16:25 (but a broker restarted at 16:30)
[X] stopped at 16:46

max_appends=6, max_batch_size=64kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=64KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 16:52
[X] even partition distribution from 17:00
[X] stopped at 17:15

max_appends=6, max_batch_size=128kb :

helm upgrade --reuse-values --set zeebe.config.zeebe.broker.experimental.maxAppendBatchSize=128KB --namespace $BENCHMARK_NAME $BENCHMARK_NAME zeebe-benchmark/zeebe-benchmark -f values-realistic-benchmark.yaml

[X] started at 17:17
[X] even partition distribution from 17:30
[X] stopped at 18:06

Results

Increasing the MAX_BATCH_SIZE increases the throughput by around 20%. This comes at the expense of increased latency (as expected) and more cpu usage, as it looks to be able to process more data.

Unfortunately, with the additional cpu usage the container starts throttling more frequently, exacerbating the latency degradation, so it would be better to see the results with more cpu dedicated how latency will behave.

For example, the Process instance execution time increases a lot, even though the rate of process instance completed is higher than the baseline.

Another interesting metric is the amount of data writte to disk, which looks much higher than before

:question: Is the load on the system constant or does it depend on how much throughput zeebe is able to process?

If the latter is true, then we're not really comparing the same setup, so it's expected to see such different numbers.

:question: However, it still poses some questions on wether zeebe is handling the additional load gracefully or with a different behaviour (such as timing out instances?)

:thought_balloon: If zeebe is behave correctly, probably increasing the buffer size to 64KB is sensible, while keeping in mind that in the future we may bump it to 128KB

lenaschoenburg commented 2 weeks ago

None of these runs were able to complete process instances in time so I'd expect that there's an underlying issue that invalidates the benchmark results unfortunately.

entangled90 commented 2 weeks ago

That's what I feared. The baseline as well was having the same issue (even more actually).

Given these results I think it's even more important to repeat them.

Zelldon commented 2 weeks ago

@entangled90 please make sure to run helm repo update to get the latest changes (recent version with fixes)

entangled90 commented 2 weeks ago

Thanks @Zelldon, I was using an outdated version of the benchmark without your fix.

This time I've run them in parallel in different namesapces cs-max-batch-${batch-size}

Here are the updated results:

I see that now the two rates are matched (50 PI/s per partition = 150) are handled in all three scenarios. When the buffer size has been increased however I see that the number of Job pushed & timedout is much higher.

:question: That may be related to the increased cpu usage & generally worse performance. Not sure if this is the cause or an effect of the increased cpu usage, but I wonder if we expected this metric to increase?

:question: could be a situation when a timeout is now breached more frequently, causing an "avalanche" effect, adding more load, which then increases the number of timeouts?

entangled90 commented 1 week ago

I see that the number of requests handled by Gateway is completely off in the benchmark for 64KB.

For the other two benchmarks the frequency of "Published Requests" is between:

70-80 for 32KB
70-110 for 128KB

entangled90 commented 1 week ago

After a review with @npepinpe we noticed that the Number of job push failed is too high, even in the base case. It's probably better to reconfigure the workers' resources in order to make them able to handle the load (in the base case)

entangled90 commented 1 week ago

Benchmarks will be rerun after https://github.com/zeebe-io/benchmark-helm/pull/204 is merged

Zelldon commented 1 week ago

Benchmarks will be rerun after zeebe-io/benchmark-helm#204 is merged

Make sure to create a new release first :) https://github.com/zeebe-io/benchmark-helm/blob/main/RELEASE.md

entangled90 commented 1 week ago

Rerun the benchmarks:

Results

Benchmark are now much more stable. Backpressure is almost always 0% for all configurations. Observations:

cpu usage increases slightly when increasing the buffer size.
- 32KB: 1.08
- 64KB: 1.12
- 128KB: 1.20
We may look into this as potentially bigger batches can be more "efficient" thus reducing cpu usage.
Processing queue size greatly improves when using 128KB buffers, while there's not a noticeable difference between 32KB and 64KB
Processing > number of records not exported: is almost zero with 128KB, while always greater than 0 in 32KB & 64KB configurations. ~~On 64KB configuration looks worse. The ElasticSearch Exporter flush duration is worse on the 64KB case compared to the other two. The cpu usage of the elastic search pod is worse.
:wrench: My expectation would be that it should not be that different from the other two values (considering 64KB sits in between).~~

Benchmark repeated for 64KB 64KB confirms to be a bit better than 32KB on processing queue size as well.
Memory GC > Bigger batch sizes slitghtly increase the frequency and duration of GC. This is consistent with the increased cpu usage.

Conclusion

In general I think bigger batches can improve performances with slight increases in cpu usage. I will repeat the test with 64KB as it looks strange.

[!NOTE] Test was repeated and confirms above conclusions

:wrench: We may look into optimizing the cpu usage/memory allocation with bigger batches, to see if we can make it more efficient.

npepinpe commented 1 week ago

Looking at overall processing throughput and processing latency, I did not really notice an improvement in any of the benchmarks compared to the baseline. That said, it's possible we aren't pushing enough load to see if there are any improvements on those metrics.

While we see an improvement in the processing queue size, this means it did not really translate to actual improvements for the user. I also couldn't confirm the "number of records not exported: is almost zero" part. It looks like it's still some? That would be a marked improvement for user, as it means less delay when it comes to seeing new data in the web apps.

If we look at section 1, we see this is true - close to 0. But then section 2 shows the normal behavior. Are these separate runs?

entangled90 commented 1 week ago

No, they are the same run. I have tried to understand what happens at 09:10, but I could find an explanation.

Almost every metric has a dip at around 09:10:

entangled90 commented 3 days ago

Given that improvements in the metrics are small and increase the cpu usage it's probably better to keep the current default value. It could possible that with a different workload a different value of this setting may be beneficial (if more cpu usage is ok for the customer for the customer example).

camunda / camunda

Evaluate a good value for max append batch size #5473

Default benchmark

Configurations timeline

max_appends=6, max_batch_size=64kb

max_appends=6, max_batch_size=128kb

max_appends=6, max_batch_size=16kb

DEFAULT max_appends=6, max_batch_size=32kb :

Results

Realistic benchmark scenario

Configuration timeline

DEFAULT max_appends=6, max_batch_size=32kb :

max_appends=6, max_batch_size=64kb :

max_appends=6, max_batch_size=128kb :

Results

REALISTIC SETUP run nr. 2

DEFAULT max_appends=6, max_batch_size=32kb :

max_appends=6, max_batch_size=64kb :

max_appends=6, max_batch_size=128kb :

Results

Results

Conclusion