An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Currently in generateCandidateFileMap we'd dump all files in nameToAddFileMap when duplicates are found. This is very tedious to debug since the list could be very long and it is very hard to identity which are the duplicates.
This changes it to only output those duplicates with the corresponding keys.
Which Delta project/connector is this regarding?
Description
Currently in
generateCandidateFileMap
we'd dump all files innameToAddFileMap
when duplicates are found. This is very tedious to debug since the list could be very long and it is very hard to identity which are the duplicates.This changes it to only output those duplicates with the corresponding keys.
How was this patch tested?
Existing tests.
Does this PR introduce any user-facing changes?
No