apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.41k stars 945 forks source link

[BUG]-Using hive to write paimon table resources is quite high #1191

Closed Dkbei closed 1 year ago

Dkbei commented 1 year ago

Search before asking

Paimon version

The paimon hive-connector version ispaimon-hive-connector-2.1-cdh-6.3-0.5-20230520.001837-16.jar

Compute Engine

hive-2.1.1-cdh-6.3.2

Minimal reproduce step

1.Data from the hive table is read to the paimon table 2.The hive table data totals 3.8 million 3.The paimon table is partitioned according to create_time and contains data within three years. If data needs to be written to the paimon table successfully, set the memory size to 8 GB. In the same scenario, only 1.5 GB memory needs to be allocated for hive table writing 4.If the paimon table increases the number of buckets, more memory resources will be required. 5.The EXCLUSIVE lock is added when hive writes the paimon table. As a result, any query is blocked.

What doesn't meet your expectations?

  1. Reduce resource usage
  2. Check whether hive writes data to the paimon table without blocking the paimon table query.

Anything else?

No response

Are you willing to submit a PR?

Alibaba-HZY commented 1 year ago

@JingsongLi Please assign this issue to me.