StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.66k stars 1.75k forks source link

更新数据时 shards 对象超大导致 FE JVM full GC #47161

Open suood opened 2 months ago

suood commented 2 months ago

Steps to reproduce the behavior (Required)

  1. 不超过 20 张更新模型表.其中有两张偏大,分别为主表和明细表,数据量分别为千万级和亿级别
  2. 20 张表都都通过 flink load 数据.
  3. 10~15 分钟刷新相关表对应的物化试图.
  4. 随着业务迁移至 SR 集群,不断新增表和数据以及增加 flink load任务.

Expected behavior (Required)

  1. FE 节点 JVM 低频 full gc 且很少或者没有 Humongous regions对象.

    Real behavior (Required)

  2. FE 节点 JVM 中的Humongous regions 不断的被分配超大对象直到发生 full gc,其中Humongous regions对象新增时间与 flink load启动时间 和 物化视图刷新时间 一致,每次启动 flink load 或者刷新 物化视图都会产生大对象.
  3. 随着新增表和数据以及增加 flink load 任务, full gc的间隔越来越小.

StarRocks version (Required)

3.1.9

MemoryAnalyzer Info

image
kevincai commented 2 months ago

something related to materialized view usage.

suood commented 2 months ago

something related to materialized view usage.

尝试停止了大部分物化视图刷新后,观察了两个小时 jvmGC,以上的情况并无任何明显好的变化.从这一点来看,这些大对象的产生与物化视图刷新无关.

有什么其他的可以尝试的建议吗?

kevincai commented 2 months ago

@suood any comparison of large objects between and after during the cycle, maybe have a heap dump and then wait for a cycle and dump again, compare the difference of large objects, might provide accurate list of objects we are looking for.