Zstd encoder limit and raise encoder cache size

This PR addresses #2965 with a different approach.

This PR addresses 2 issues with the current implementation

The number of in-use zstd encoders can exceed GOMAXPROCS if a large number of goroutines are used
The number of cached encoders is too low for highly parallel sarama use, leading to repeated encoder creation and thus low throughput

The PR preserves the following property

Encoders are lazy created

The memory behavior of applications can change slightly. Before applying the patch:

The maximum memory usage from encoders was tied to the concurrency (goroutines) but would shrink to 1 on idle

After applying the patch:

The maximum memory usage from encoders is tied to the peak parallelism per compression level

This should not change the worst case for the great majority of users, but it might be relevant in cases where applications were alternating between high sarama use and other uses.

There are 2 new benchmarks and a testing flag (zstdTestingDisableConcurrencyLimit) to verify the concurrency limiting. I've also added some more information to the tests (like setting the bytes so throughput can be measured). Here is a sample output from my machine (AMD framework 13):

# go test -benchmem -run=^$ -test.v -bench ^BenchmarkZstdMemory github.com/IBM/sarama
goos: linux
goarch: amd64
pkg: github.com/IBM/sarama
cpu: AMD Ryzen 7 7840U w/ Radeon  780M Graphics     
BenchmarkZstdMemoryConsumption
BenchmarkZstdMemoryConsumption-16                             16          68034969 ns/op        2959.16 MB/s            96.00 (gomaxprocs)               1.000 (goroutines)     21974595 B/op           815 allocs/op
BenchmarkZstdMemoryConsumptionConcurrency
BenchmarkZstdMemoryConsumptionConcurrency-16                  39          30498097 ns/op        8801.71 MB/s             4.000 (gomaxprocs)            256.0 (goroutines)       86327669 B/op          1479 allocs/op
BenchmarkZstdMemoryNoConcurrencyLimit
BenchmarkZstdMemoryNoConcurrencyLimit-16                      21          52053651 ns/op        5156.90 MB/s             4.000 (gomaxprocs)            256.0 (goroutines)       437548737 B/op         2196 allocs/op
PASS
ok      github.com/IBM/sarama   3.566s

IBM / sarama

Zstd encoder limit and raise encoder cache size #2979