alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.25k stars 72 forks source link

Add warning for duplicate fieldnames in DictReader #23

Closed GjjvdBurg closed 4 years ago

GjjvdBurg commented 4 years ago

When field names (headers) are not unique in a CSV file, the csv module's DictReader simply overwrites part of the data, as this is what happens with a dict as well: {'a': 1, 'b': 2, 'a': 3} = {'a': 3, 'b': 2}. This is a known issue in Python, but there is no consensus on how it should be handled.

In practice this can lead to unexpected data loss. With this PR we print a warning when this occurs, so the user is at least aware of the issue. Warnings can be easily suppressed (see warning module), so I don't expect this to be a significant burden on the end user. Hopefully this makes the DictReader less surprising.