更新数据时 shards 对象超大导致 FE JVM full GC

suood commented 2 months ago

Steps to reproduce the behavior (Required)

不超过 20 张更新模型表.其中有两张偏大,分别为主表和明细表,数据量分别为千万级和亿级别
20 张表都都通过 flink load 数据.
10~15 分钟刷新相关表对应的物化试图.
随着业务迁移至 SR 集群,不断新增表和数据以及增加 flink load任务.

Expected behavior (Required)

FE 节点 JVM 低频 full gc 且很少或者没有 Humongous regions对象.
Real behavior (Required)
FE 节点 JVM 中的Humongous regions 不断的被分配超大对象直到发生 full gc,其中Humongous regions对象新增时间与 flink load启动时间和物化视图刷新时间一致,每次启动 flink load 或者刷新物化视图都会产生大对象.
随着新增表和数据以及增加 flink load 任务, full gc的间隔越来越小.

StarRocks version (Required)

3.1.9

MemoryAnalyzer Info

kevincai commented 2 months ago

something related to materialized view usage.

suood commented 2 months ago

something related to materialized view usage.

尝试停止了大部分物化视图刷新后,观察了两个小时 jvmGC,以上的情况并无任何明显好的变化.从这一点来看,这些大对象的产生与物化视图刷新无关.

有什么其他的可以尝试的建议吗?

kevincai commented 2 months ago

@suood any comparison of large objects between and after during the cycle, maybe have a heap dump and then wait for a cycle and dump again, compare the difference of large objects, might provide accurate list of objects we are looking for.

StarRocks / starrocks