Open andreaschat-db opened 8 months ago
The reason for the assert is because TahoeLogFileIndex
is unstable that is not yet pinned to a particular version of the table snapshot. According to @ryan-johnson-databricks , if it hasisTimeTravel=true
, then it is stable. There is a special case in PrepareDeltaScan when data skipping is disabled, it generates TahoeLogFileIndex
with isTimeTravel=fasle
.
More on this assert: We have a rule PreprocessTablesWithDVs that alters the query scan with an additional (filePath to DV) broadcast map for reading DVs. This rules immediately after the PrepareDeltaScan rule here. In PreprocessTablesWithDVs, we enforce a requirement that the scan shouldn’t have TahoeLogFileIndex as it is not stable as it is not yet pinned. However when data skipping is disabled, we keep the TahoeLogFileIndex with isTimeTravel=false. This fails the assert in PreprocessTablesWithDVs.
Bug
Which Delta project/connector is this regarding?
Describe the problem
Disabling DELTA_STATS_SKIPPING causes operations with DVs to fail with:
Cannot work with a non-pinned table snapshot of the TahoeFileIndex when reading a table with DVs
. The issue is caused because when we disable DELTA_STATS_SKIPPING the plan contains a TahoeLogFileIndex. This causes the requirement to fail atPreprocessTableWithDVs.dvEnabledScanFor
to fail.Steps to reproduce
The issue can be reproduced with:
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?