delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Better error messages when file name collides #3713

Open sunchao opened 1 month ago

sunchao commented 1 month ago

Which Delta project/connector is this regarding?

Description

Currently in generateCandidateFileMap we'd dump all files in nameToAddFileMap when duplicates are found. This is very tedious to debug since the list could be very long and it is very hard to identity which are the duplicates.

This changes it to only output those duplicates with the corresponding keys.

How was this patch tested?

Existing tests.

Does this PR introduce any user-facing changes?

No