Open DavidCampanero opened 1 year ago
The Spark action for this takes the difference between files reachable after the expire snapshots and before the expire snapshots and deletes that. So the Spark Action would preserve data files. Now for the pure java version, the implementation is much more complicated but the intent is the same although it may not be correct in this use case.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Query engine
Spark
Question
I've writed an script that keep the first snapshot of each month and the last few days. But once I delete the snapshots older thatn 7 days that are not the first of each month with
expireSnapshotId
it seems like I don't have access to previous data even if the metadata files (json and avro) are still there. But the data that the avro file references it's no longer there.So I don't know if "break the chain" means that I will lose the data and I will not be able to time travel to check how it was the data 1 year ago if i have deleted data in the middle.