Open vkazmirchuk opened 3 weeks ago
@vkazmirchuk thanks for reporting this.
Do I understand correctly memory consumption grows over time?
After some number of iterations we had high memory consumption by the clickhouse client.
Do you have any runtime statistics on GC attempting to free memory? Is memory going to the OS/container limit?
@jkaflik
Do I understand correctly memory consumption grows over time?
Yes, it happens after a lot of baches insertions
Do you have any runtime statistics on GC attempting to free memory? Is memory going to the OS/container limit?
We tried to run GC manually after each insertion, but it doesn't release memory completely and for a long time the client accumulates memory which is not cleared.
@vkazmirchuk Could you also check for number of goroutines? runtime.NumGoroutine()
Do you run any of INSERT in goroutine?
@vkazmirchuk, just to clarify - because GC in Go and memory management in general is tricky. Let me ask again:
Is memory going to the OS/container limit?
do you encounter out-of-memory? I want to double-check if it's GC not releasing or we have problem with a stuff not releasing a memory.
See: https://pkg.go.dev/runtime/debug#FreeOSMemory (https://stackoverflow.com/questions/37382600/cannot-free-memory-once-occupied-by-bytes-buffer/37383604#37383604)
If you are not hitting out-of-memory, you could play with GOMEMLIMIT
to override container/host memory info: https://tip.golang.org/doc/gc-guide#Memory_limit
Could you also check for number of goroutines? runtime.NumGoroutine() Do you run any of INSERT in goroutine?
We have one gorutine that performs insertions of 1 bach one after another. The total number of gorutines in the appendix is 23. But I don't think that plays any role.
do you encounter out-of-memory? I want to double-check if it's GC not releasing or we have problem with a stuff not releasing a memory.
Yeah, we end up catching the OOM.
If you are not hitting out-of-memory, you could play with GOMEMLIMIT to override container/host memory info
This is what helped us not to catch OOM, we limit memory to 40 gigabytes and it saves us, but our application after a while eats all 40 gigabytes and stays at that level.
It would be ideal to reuse memory within the client to clickhouse. We have implemented sync.Pool
in our application for many things and it has helped us a lot in optimisation.
we limit memory to 40 gigabytes and it saves us, but our application after a while eats all 40 gigabytes and stays at that level.
When you set GOMEMLIMIT to 40 GiB, it stays at this level, and it does not go over the limit, right? It sounds like GC is not freeing memory to the OS. This is something that can happen.
What if you lower GOMEMLIMIT? This can influence CPU cycles as GC will be executed more often.
Besides that, of course, we should invest a bit into figuring out how we can save us from unnecessary memory allocations.
Current driver architecture assumes everything is buffered per each block. Thus, high memory consumption can happen. For now what I can recommend is to lower batch size.
Observed
We insert 1 million records at a time using the function
batch.AppendStruct(item)
After some number of iterations we had high memory consumption by the clickhouse client.pprof memory report: pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.028.pb.gz
Our golang structure that we put into the database:
Expected behaviour
The client should reuse memory whenever possible, rather than allocating new memory at each iteration of batch insertion
Code example
Details
Environment
clickhouse-go
version:v2.25.0
database/sql
compatible driver:ClickHouse API
1.22.1
Linux
23.8.8.20
No
CREATE TABLE
statements for tables involved: