alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.24k stars 70 forks source link

Confidence score #115

Open alexandreczg opened 9 months ago

alexandreczg commented 9 months ago

Is there a way to use the sniffer with a confidence score threshold? I am noticing that while the library works well for many type of CSV, I have a couple of control cases that aren't CSV at all, fixed-width files actually, where the sniffer is returning a dialect. I'd like to have access to the confidence score of sniffer in order to base my decision on using the returned delimiter.

As a matter of fact, I have ran quite a few files through the sniffer and I haven't got a None response yet, which makes believe the logic is a little bit to eager to produce a dialect, even at low confidence.

Below I show the file on the left alongside with the delimiter on the right.

image