apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[Feature] Improve catalog lock for paimon #2824

Open FangYongs opened 5 months ago

FangYongs commented 5 months ago

Search before asking

Motivation

Currently paimon only support hive lock for hive catalog, it should support unified lock for filesystem catalog

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

chenxi0599 commented 5 months ago

Great!I want this feature, too.

legendtkl commented 5 months ago

+1 for this.

JingsongLi commented 4 months ago

+1 thanks @FangYongs for driving!

JingsongLi commented 3 months ago

Hi, after reviewing https://github.com/apache/paimon/pull/3076

I feel more description should be provided here, otherwise it may lead to incorrect design by contributors.

I suggest having a clear design before creating subtasks. What do you think? @FangYongs

FangYongs commented 3 months ago

Thanks @JingsongLi , and I completely agree with your opinion! I will create a PIP for this feature and start a discussion thread later.

sunxiaojian commented 3 months ago

@FangYongs @JingsongLi

  1. I think that the parameters for LockContext to be open to the public should only be 'option', so there is no need to reference the connection implementation between jdbc and hive in the Filesystem Catalog, which is the future ClientPool

2.I think the parameters required for lock should be distinguished by adding a prefix of "lock." This will not cause any inconvenience to users. However, metastores like jdbc and hive, which come with their own Lock implementation, need to be compatible and do not require redeclaration of lock specific configuration parameters

  1. The core issue is the issue of connection. If we need to create a new connection every time and close it after use, it is inevitable to have too many connections, and it may not be possible to seize them during busy times. Therefore, it is recommended to use cache to cache and set the eviction time

If possible, I can refactor according to the above description first.

rmotapar commented 2 weeks ago

+1 for this