apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.21k stars 2.17k forks source link

metadata json conflict when streaming #9171

Open thongdq1 opened 10 months ago

thongdq1 commented 10 months ago

Apache Iceberg version

1.2.1

Query engine

Spark

Please describe the bug 🐞

I'm using micro batch spark streaming read parquet file and writing to iceberg table. When writing new records, there could writing parquet files, mainfest files, snapshot files. However, the metadata json file wil be conflict sometimes with previous file that already wrote. Meaning the content's new metadata file is same with previous metadata file. I'd also tried newest iceberg version but still get same problems.

amitmittal5 commented 10 months ago

Hello, I am also running a spark streaming job with latest version of spark and iceberg, however seeing the data file is getting overwritten in subsequent stream execution. I have raised my issue here https://github.com/apache/iceberg/issues/9172, so just wondering if it is the same root cause for our issues.