Open Dkbei opened 1 year ago
A large amount of data is written into the paimon table from the hive table to generate multiple small files
No compaction? Hive writer is no compaction. You need to launch Flink Job to do compaction.
A large amount of data is written into the paimon table from the hive table to generate multiple small files
No compaction? Hive writer is no compaction. You need to launch Flink Job to do compaction.
If the hive table contains a large number of small files, oom does not occur for the same statement。
A large amount of data is written into the paimon table from the hive table to generate multiple small files
No compaction? Hive writer is no compaction. You need to launch Flink Job to do compaction.
If the hive table contains a large number of small files, oom does not occur for the same statement。
Yes, they are not small files only, they are unmerged files, Hive writer has no compaction, only Flink Spark writers have compaction.
Search before asking
Paimon version
Scenario description:
Abnormal information:
When the limit command is used to query data, fetch cannot be used. Data is read directly through mapreduce
Compute Engine
hive :cdh-6.3.2 hive 2.1.1 Paimon: master branches
Minimal reproduce step
A large amount of data is written into the paimon table from the hive table to generate multiple small files
What doesn't meet your expectations?
The limit operation should not cause oom, and the limit operation can fetch directly
Anything else?
No response
Are you willing to submit a PR?