apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.27k stars 911 forks source link

[Feature] When records have same sequence number , the latter one is used as the final result. #1050

Closed lppsuixn closed 1 year ago

lppsuixn commented 1 year ago

Search before asking

Motivation

When consuming CDC data, we use sequence.field to avoid disorder. However, since MySQL's execution time is at the level of seconds, there may be issues with the same ID and same execution time.

Solution

add an auto-incrementing "inner-sequence-number" field in the SystemColumns. When the sequence numbers are the same, can compare the inner-sequence-numbers to determine the order.

Anything else?

No response

Are you willing to submit a PR?

jameswangchen commented 1 year ago

I have the same problem.It will cause the data to be incorrect.

JingsongLi commented 1 year ago

Thanks @lppsuixn and @jameswangchen for reporting!

Consider paimon sink to add an option so that if the sequence_number provided does not meet the precision, sink goes and makes up a nanos after it.

schnappi17 commented 1 year ago

@JingsongLi I'm willing to task this, could you assign it to me please?

JingsongLi commented 1 year ago

@JingsongLi I'm willing to task this, could you assign it to me please?

Okay, can you think about how to design this API first? How to splice the user's sequence_field and nano_time?

schnappi17 commented 1 year ago

@JingsongLi I'm willing to task this, could you assign it to me please?

Okay, can you think about how to design this API first? How to splice the user's sequence_field and nano_time?

Yes, thanks for reminding~I'll think about it.

jameswangchen commented 1 year ago

@schnappi17 Hello,what is the status of this issue?

schnappi17 commented 1 year ago

@schnappi17 Hello,what is the status of this issue?

Hi @jameswangchen, I'm working on this, will submit PR this week.