HashDataInc / bireme

Bireme is an incremental synchronization tool for the Greenplum / HashData data warehouse
https://hashdatainc.github.io/bireme/
Apache License 2.0
137 stars 53 forks source link

Kafka重复消费问题 #134

Open cobolbaby opened 3 years ago

cobolbaby commented 3 years ago

如果Transformer/Merger/Loader线程执行较慢,那Kafka Offset就迟迟不能提交。

这种情况下,应该会造成重复消费到同样的数据吧。具体见如下代码:

https://github.com/HashDataInc/bireme/blob/9cfc128230e7a718394132d9a51ec7d1019d08be/src/main/java/cn/hashdata/bireme/pipeline/KafkaPipeLine.java#L44-L51

Ps: 看来bireme只能依赖主键做兜底了。