apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.35k stars 928 forks source link

[Bug] deletion-vectors 的开启会影响 sequence.field 的准确性!!! #4072

Closed adu-shzz closed 1 month ago

adu-shzz commented 1 month ago

Search before asking

Paimon version

v0.8.1

Compute Engine

Flink-1.16.2

Minimal reproduce step

1、在 Paimon 中建主键表,合并引擎用 deduplicate 。 2、设置表属性 'sequence.field' = 'version''deletion-vectors.enabled' = 'true''changelog-producer' = 'lookup' 3、构造 1520 万测试数据,并用 Flink-1.16.2 往该表写入数据 4、数据写完之后,取某个主键值对应的行,观察它的 version 值,与数据源中的最大值不相等。 5、将 deletion-vectors.enabled 改为 false,重建表再跑 Flink 作业,结果符合预期,version 值正确。

注:相同的数据和 Flink 作业,写入 Apache-Doris 能得到准确的结果,Doris 用的 UNIQUE_KEY 表和 "function_column.sequence_col" = "version"

What doesn't meet your expectations?

期望:在启用 deletion-vectors.enabled 时,不影响 sequence.field 的准确性!

Anything else?

No response

Are you willing to submit a PR?

JingsongLi commented 1 month ago

Fixed in https://github.com/apache/paimon/pull/4075