Open Iabhishekkothari opened 2 years ago
Sorry for the delay. Spark streaming doesn't support overwrite mode explicitly. To be clear, you want to retrieve/save only the data from the latest committed file on a per-partition basis correct?
For overwriting on a per-partition basis, we are releasing support for dynamic partition overwrite in Delta 2.0 which allows you to selectively overwrite only the partitions with data being written into them. However, we do recommend using this cautiously and validating which partitions your data touches to avoid unintentional data loss. To do this more safely, you can use replaceWhere.
You can then use ForeachBatch to perform the overwrite.
I'm using spark stream to append data to delta table,but i need only the latest data(data of latest file recieved in each partition). As stream doesn't support overwrite, is there any work around? Can we keep only the latest files in each partition and vacuum rest. Is there any way to use overwrite in spark stream? Urgent help needed