[Performance Improvement] Pool `gzip.Writer` and `bytes.Buffer`

pakio commented 8 months ago

Summary

Added sync.Pool to pool gzip.Writer and bytes.Buffer for better performance.

Background

While I was investigating profile report of my application, noticed transport client is not pooling gzip writer and buffer. As discussed in this Reddit thread and many other places, pooling them will benefit us by reusing allocated memory space rather than allocating and freeing them on every request.

Performance Comparison

These are the benchmark result on my machine (Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz)

Compress body

	ns/op	B/op	allocs/op
Before	129744	815064	32
After(Without pooling)	117462	815014	31
After(With poolong)	20462	1152	12

bench result

Running tool: /usr/local/bin/go test -benchmem -run=^$ -bench ^BenchmarkTransport$ github.com/elastic/elastic-transport-go/v8/elastictransport

goos: darwin
goarch: amd64
pkg: github.com/elastic/elastic-transport-go/v8/elastictransport
cpu: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz

=== RUN   BenchmarkTransport/Compress_body_(pool:_false)
BenchmarkTransport/Compress_body_(pool:_false)
BenchmarkTransport/Compress_body_(pool:_false)-8                    9450            117462 ns/op          815014 B/op         31 allocs/op
=== RUN   BenchmarkTransport/Compress_body_(pool:_true)
BenchmarkTransport/Compress_body_(pool:_true)
BenchmarkTransport/Compress_body_(pool:_true)-8                    57690             20462 ns/op            1152 B/op         12 allocs/op

====================
=== before the change ===
====================

=== RUN   BenchmarkTransport/Compress_body
BenchmarkTransport/Compress_body
BenchmarkTransport/Compress_body-8                    9697            129744 ns/op          815064 B/op         32 allocs/op

pakio commented 8 months ago

Hi @Anaethelion, would you mind reviewing this PR whenever you are available? I've been using this library via go-elasticsearch in production and this change would really help our application run more stable with high traffic volume environment.

Anaethelion commented 8 months ago

Hi @pakio

I've taken some time to review and evaluate this PR. While this seems like a worthy improvement for some use cases, it turns out that this could hurt a lot others and create some contention around the pool which could lead to OOM.

I'm not going to merge this in its current state, however I'm interested to know if you have more examples of real world use cases beyond the benchmark ? How does it help you application being more stable ? What kind of usage do you have where this helps ? Search or Bulk indexing ?

We still have the option of maintaining the current behavior and adding the pool as an optional behavior.

pakio commented 8 months ago

hi @Anaethelion , thanks for your feedback!

While this seems like a worthy improvement for some use cases, it turns out that this could hurt a lot others and create some contention around the pool which could lead to OOM.

Understood. Just to clarify, does this mean you've actually experienced OOM with sync.Pool or with this implementation, or have general concern about it?

however I'm interested to know if you have more examples of real world use cases beyond the benchmark

sure, here are some examples of use cases of sync.Pool along with gzip.NewWriter

grpc-go: link prometheus golang client: link

And our team have also been using the feature on the platform for more than a year and is operating without any problem.

How does it help you application being more stable ? What kind of usage do you have where this helps ? Search or Bulk indexing?

Our team uses go-elasticsearch for both data indexing pipeline and search application. While I was exploring the CPU time profile graph of our search application, interestingly more than 5% of the time were consumed on elastictransport.Perform, specifically to allocate memory for copying bytes to buffers . The size of requests we sending is generally around 3~5KB. Unfortunately we don't have profile for indexing pipeline but using esutil.BulkIndexer to batch-send requests with threshold of 10MB, therefore I suspect the impact would be larger.

We still have the option of maintaining the current behavior and adding the pool as an optional behavior.

I'm happy to make this change as well as adding more practical test case using actual json file, but firstly I would like to understand your concern regarding OOM by using sync.Pool so that I can properly address the issue.

Anaethelion commented 8 months ago

Understood. Just to clarify, does this mean you've actually experienced OOM with sync.Pool or with this implementation, or have general concern about it?

This is just general concern around edge cases while pushing the BulkIndexer within its limits, I have a feeling this could lead to memory exhaustion if the machine that runs the indexing tries to do other tasks and the pool has grown too big. My point is, it could break existing usages and I'd like not to do that. :)

sure, here are some examples of use cases of sync.Pool along with gzip.NewWriter

Thank you!

Our team uses go-elasticsearch for both data indexing pipeline and search application. While I was exploring the CPU time profile graph of our search application, interestingly more than 5% of the time were consumed on elastictransport.Perform, specifically to allocate memory for copying bytes to buffers . The size of requests we sending is generally around 3~5KB. Unfortunately we don't have profile for indexing pipeline but using esutil.BulkIndexer to batch-send requests with threshold of 10MB, therefore I suspect the impact would be larger.

Got you. It matches what I've benchmarked locally as well. The greater the payload, the greater the gain which makes sense. At the same time for search heavy use cases I've found this could slow down the client depending on the compression that has been set up.

I'm happy to make this change as well as adding more practical test case using actual json file, but firstly I would like to understand your concern regarding OOM by using sync.Pool so that I can properly address the issue.

That concern only led me to consider making this more pluggable at least at first and then in time or for a new major we can flip the switch and make that the default. But as much as we can I'd like not to break existing behavior.

pakio commented 8 months ago

I have a feeling this could lead to memory exhaustion if the machine that runs the indexing tries to do other tasks and the pool has grown too big.

It sounds a fair concern. Will update the PR soon to make it optional, and will ping you when it's ready!

pakio commented 7 months ago

I've made a few changes on my implementation

Moved gzip compression logic to separate file
Added gzipCompressor interface, added simpleGzipCompressor and pooledGzipCompressor inherits the interface
Added Config.PoolCompressor to determine which compressor to use from client

I also found a bug in the benchmark that 1. reader was always at the end of the cursor except first loop 2. tp was always recreated in the bench which means the pool is not utilized properly. I updated them as well as the benchmark result in the description.

ptal when you get a chance! @Anaethelion

elastic / elastic-transport-go