go-mysql-org / go-mysql

a powerful mysql toolset with Go
MIT License
4.57k stars 971 forks source link

utils.BytesBufferGet performance #681

Open howmoxuan opened 2 years ago

howmoxuan commented 2 years ago

in utils.BytesBufferGet, the code use bytesBufferChan. May I know what is the usage/improvement of that channel based on? In my application, it consume much CPU resource. (the pprof of the application is attached below). Can we just remove it? I do a benchmark, without channel is much faster. Any reason we add that channel?

image

lance6716 commented 2 years ago

by git blame you can find this PR https://github.com/go-mysql-org/go-mysql/pull/466.

Can you upload the pprof file, or provide a picture of whole flame graph?

howmoxuan commented 2 years ago

image

howmoxuan commented 2 years ago

the moment we do pprof, we just catching the binlog position to latest and no handler for it.

howmoxuan commented 2 years ago

image

howmoxuan commented 2 years ago

However, i could not understand why using select-case-default in this case will result in high runtime.Lock. still looking at the code (and learning golang as well)

moredure commented 2 years ago

Is there any special reason for collecting buffers into channels with magical length of 10? Why not to do just plain utilisation of sync.Pool, since it's already handling data more than one GC cycle pooled. https://github.com/go-mysql-org/go-mysql/pull/682/files.

Also every 11th+ concurrent call to https://gist.github.com/moredure/4e6b7a1dcd7e0f2d857ca605caf53833 ByteSliceGet will make an allocation, so in highly concurrent environments with high number of requests (more than 10 concurrent requests) ByteSliceGet will allocate, due to sync.Pool interface{} allocates in case of []byte, so some bytebuffer implementation be better used.

lance6716 commented 2 years ago

by git blame you can find this PR #466.

@atercattus ptal, now two persons are planning to revert your pr 😄

atercattus commented 2 years ago

Hello for all! Extra pool via chan was made for reduce memory allocations 😅 At the time (2 years ago) it gave very good results.

with magical length of 10

In our use case (~1kk fast requests per CPU core) it was enough: this allowed us to save buffers between GCs. To make it better, it was necessary to have more use cases.

Nevertheless. 38.95% just looks terrible. I will be glad if you remove this chan and leave only the sync.Pool :)

moredure commented 2 years ago

@howmoxuan , we merged a few changes, can you check performance from master branch?