Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
2.35k
stars
928
forks
source link
[Bug] deletion-vectors 的开启会影响 sequence.field 的准确性!!! #4072
Closed
adu-shzz closed 1 month ago
Search before asking
Paimon version
v0.8.1
Compute Engine
Flink-1.16.2
Minimal reproduce step
1、在 Paimon 中建主键表,合并引擎用 deduplicate 。 2、设置表属性
'sequence.field' = 'version'
、'deletion-vectors.enabled' = 'true'
、'changelog-producer' = 'lookup'
3、构造 1520 万测试数据,并用 Flink-1.16.2 往该表写入数据 4、数据写完之后,取某个主键值对应的行,观察它的 version 值,与数据源中的最大值不相等。 5、将 deletion-vectors.enabled 改为 false,重建表再跑 Flink 作业,结果符合预期,version 值正确。注:相同的数据和 Flink 作业,写入 Apache-Doris 能得到准确的结果,Doris 用的 UNIQUE_KEY 表和
"function_column.sequence_col" = "version"
。What doesn't meet your expectations?
期望:在启用 deletion-vectors.enabled 时,不影响 sequence.field 的准确性!
Anything else?
No response
Are you willing to submit a PR?