StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.73k stars 1.75k forks source link

Metadata cache of Iceberg table may cause FE OOM frequently if there are a lots of table. #47630

Open ucasfl opened 3 months ago

ucasfl commented 3 months ago

Enhancement

In our situation, there are iceberg database with more than 2w tables under it. Data write into the iceberg table streamingly, and we have a lot of query on these tables.

We meet a FE OOM problem. The memory usage of FE is increse rapidly(JVM old) and lead to OOM, then need to restart.

After analyze, we found the memory is occupied by the metadata(snapshots) of Iceberg table:

企业微信截图_95795554-04ba-48fa-b5b1-574cc672336e

2734 Iceberg tables have more than 170 millions snapshots, 6000+ snapshots for each table average. (We keep snapshots of last 48 hours for each Iceberg table)

https://github.com/StarRocks/starrocks/blob/b672de4b07d239e70b57854cb4ea499ad1655668/fe/fe-core/src/main/java/com/starrocks/connector/iceberg/IcebergMetadata.java#L164

There is no limit on size of tables or memory usage.

We try to remove the metadata cache, the memory usage become normal:

image

Possible solution

There are should be have limit on the memory usage.

lianneli commented 2 months ago

I found same situations in this days...

chenminghua8 commented 2 months ago

To manage so much metadata information, the only option is to continue increasing the FE JVM heap memory.

gupeng1208 commented 2 months ago

this is an interesting quesion. i think this problem will occur whenever iceberg is queried through starrocks, until starrocks iteself does not have enough memory