Fixes a failure that could occur in checkpoint cleanup in situations where a table exists in one epoch but not in a previous epoch. To clean up a checkpoint, we follow the following procedure:
Get the metadata for the "new min" epoch (the oldest one that won't be cleaned) and look at all of the files that it references
For each epoch that we are cleaning, get their metadata and look at all of the files that they reference
For every file in (2) that's not in (1), delete it
To actually determine the files that are referenced, we have to look at the table metadata to figure out the table type and config. That involves looping through all of the tables referenced in a particular operator checkpoint. However, it turns out there was a subtle bug where we were using the metadata from (1) for each iteration of (2). That meant that if there was an table that existed only in the new_min epoch but not in the previous checkpoints, we would fail to find it in the older one and panic.
The fix is to ensure we are always iterating over the tables of the epoch that we're cleaning.
Fixes a failure that could occur in checkpoint cleanup in situations where a table exists in one epoch but not in a previous epoch. To clean up a checkpoint, we follow the following procedure:
To actually determine the files that are referenced, we have to look at the table metadata to figure out the table type and config. That involves looping through all of the tables referenced in a particular operator checkpoint. However, it turns out there was a subtle bug where we were using the metadata from (1) for each iteration of (2). That meant that if there was an table that existed only in the new_min epoch but not in the previous checkpoints, we would fail to find it in the older one and panic.
The fix is to ensure we are always iterating over the tables of the epoch that we're cleaning.
Closes #688