apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.69k stars 3.28k forks source link

[Enhancement] HLL registers use chunk allocator for better memory usage #9194

Open zbtzbtzbt opened 2 years ago

zbtzbtzbt commented 2 years ago

Search before asking

Description

pr https://github.com/apache/incubator-doris/pull/9188

test1
select 
    dt, HLL_UNION_AGG(union_id)
from
    xxx
where
    dt between a and b
group by dt;

speed

hll_chunkAlloc: 547.794 seconds apache:master: 712.229 seconds

mem.memused

hll_chunkAlloc: used 38G memory avg backend node apache:master: used 46G memory avg backend node

mem.memused.percent

hll_chunkAlloc: 15% apache:master: 18%


hll_chunkAlloc: ChunkAlloctor::allocate is still a heavy function, but it's better than use new as we know, and it does have some boost (new -> SystemAlloctor -> system call wil not record in perf report )

chunck_concu_2

apache:master

hll2

Solution

https://github.com/apache/incubator-doris/pull/9188

Are you willing to submit PR?

Code of Conduct

xinyiZzz commented 2 years ago

speed hll_chunkAlloc: 547.794 seconds apache:master: 712.229 seconds mem.memused hll_chunkAlloc: used 38G memory avg backend node apache:master: used 46G memory avg backend node mem.memused.percent hll_chunkAlloc: 15% apache:master: 18%

Using ChunkAllocor to execute the same SQL for the second time will be faster, and avoid tcmalloc lock in high concurrency, so the effect will be more obvious under high concurrency~

Because ChunkAllocor has a cold start, the memory does not need to be allocated again when executing the same SQL for the second time, provided that the upper limit of the cache is not exceeded.