apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.47k stars 969 forks source link

[trivial] code refactoring for paimon table action. #4194

Closed zhuyaogai closed 1 month ago

zhuyaogai commented 2 months ago

Purpose

I am new to paimon, and hope to do some trivial work for paimon, and after refactoring every action can specify the table conf more simpler. Linked issue: close #xxx

Tests

Existed tests

API and Format

No

Documentation

No

zhuyaogai commented 2 months ago

Hi, @zhuyaogai , thanks for the contribution.

But it is hard to understand this refactoring. It seems we don't need have a dynamic options method. SortCompactAction already override the constructor.

@JingsongLi hi, master, thanks for your suggestion. I just think that if any paimon action needs to modify the table conf, just overwrite the dynamic options method is okay. And if it doesn't make sense to you, just ignore it.

By the way, could you tell me that why there is a single thread executor for compaction? Can it be a multi threads pool? (corePoolSize can be configured by user). Looking forward to your answer.

https://github.com/apache/paimon/blob/51f24880e07107d164c2aee8c959de2ea01c1730/paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreWrite.java#L464

zhuyaogai commented 2 months ago

@JingsongLi hi, master, could you answer my question? Thanks! I know that for one partition-bucket writer the compaction process should be executed serially(just one by one), but I think for the different partition-bucket writers the compaction processes can be executed in parallel which improves the efficiency of compaction?

JingsongLi commented 1 month ago

At present, in distributed computing processing, the computing engine assumes that single concurrency is single threading. If there are too many threads, it may cause the node heartbeat timeout of the computing engine.