This PR addresses #2965 with a different approach.
This PR addresses 2 issues with the current implementation
The number of in-use zstd encoders can exceed GOMAXPROCS if a large number of goroutines are used
The number of cached encoders is too low for highly parallel sarama use, leading to repeated encoder creation and thus low throughput
The PR preserves the following property
Encoders are lazy created
The memory behavior of applications can change slightly.
Before applying the patch:
The maximum memory usage from encoders was tied to the concurrency (goroutines) but would shrink to 1 on idle
After applying the patch:
The maximum memory usage from encoders is tied to the peak parallelism per compression level
This should not change the worst case for the great majority of users, but it might be relevant in cases where applications were alternating between high sarama use and other uses.
There are 2 new benchmarks and a testing flag (zstdTestingDisableConcurrencyLimit) to verify the concurrency limiting.
I've also added some more information to the tests (like setting the bytes so throughput can be measured).
Here is a sample output from my machine (AMD framework 13):
This PR addresses #2965 with a different approach.
This PR addresses 2 issues with the current implementation
The PR preserves the following property
The memory behavior of applications can change slightly. Before applying the patch:
After applying the patch:
This should not change the worst case for the great majority of users, but it might be relevant in cases where applications were alternating between high sarama use and other uses.
There are 2 new benchmarks and a testing flag (zstdTestingDisableConcurrencyLimit) to verify the concurrency limiting. I've also added some more information to the tests (like setting the bytes so throughput can be measured). Here is a sample output from my machine (AMD framework 13):