apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.35k stars 2.42k forks source link

[SUPPORT] Using BULK_INSERT mode multiple times writing causing a bug: Duplicate fileId 00000000-8651-4ae5-8f9e-4424fed2d181 from bucket 0 of partition found during the BucketStreamWriteFunction index bootstrap. #10894

Closed Toroidals closed 6 months ago

Toroidals commented 6 months ago

Tips before filing an issue

Describe the problem you faced

Using BULK_INSERT mode multiple times writing causing a bug: Duplicate fileId 00000000-8651-4ae5-8f9e-4424fed2d181 from bucket 0 of partition found during the BucketStreamWriteFunction index bootstrap. configuration: write.operation=BULK_INSERT index.type=BUCKET hoodie.index.bucket.engine=SIMPLE

To Reproduce

Steps to reproduce the behavior:

1.A program writes to table a in BULK_INSERT mode. 2.Another program writes to this table using BULK_INSERT again, and the data written in the two times are not duplicated. 3.When trying to write incremental data using upsert mode, an error occurred.:Duplicate fileId 00000000-8651-4ae5-8f9e-4424fed2d181 from bucket 0 of partition found during the BucketStreamWriteFunction index bootstrap.

Expected behavior

How to use BULK_INSERT to write multiple times to the same table

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

danny0405 commented 6 months ago

Did you configured the concurrency control for the multi-writers?