Open thongdq1 opened 10 months ago
Hello, I am also running a spark streaming job with latest version of spark and iceberg, however seeing the data file is getting overwritten in subsequent stream execution. I have raised my issue here https://github.com/apache/iceberg/issues/9172, so just wondering if it is the same root cause for our issues.
Apache Iceberg version
1.2.1
Query engine
Spark
Please describe the bug 🐞
I'm using micro batch spark streaming read parquet file and writing to iceberg table. When writing new records, there could writing parquet files, mainfest files, snapshot files. However, the metadata json file wil be conflict sometimes with previous file that already wrote. Meaning the content's new metadata file is same with previous metadata file. I'd also tried newest iceberg version but still get same problems.