alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.25k stars 72 forks source link

Detection breaks on good file #99

Open hyperknot opened 1 year ago

hyperknot commented 1 year ago

The following file is from Google Sheets. In one column there is a markdown formatted multiline text. The problem is that CleverCSV detects this file wrong, braking it.

Essentially it select this super weird star(*) based delimiter, which then breaks the whole file.

Running normal form detection ...
Not normal, has potential escapechar.
Running data consistency measure ...
SimpleDialect(',', '', ''): P =       14.309419 T =        0.672613 Q =        9.624698
SimpleDialect(',', '', '/'):    P =       14.268794 T =        0.615974 Q =        8.789203
SimpleDialect(',', '"', ''):    P =       37.647059 T =        0.942647 Q =       35.487889
SimpleDialect(',', '"', '/'):   P =       18.751838 skip.
SimpleDialect('', '', ''):  P =        0.313000 skip.
SimpleDialect('', '"', ''): P =        0.040000 skip.
SimpleDialect(' ', '', ''): P =       45.500250 T =        0.332927 Q =       15.148254
SimpleDialect(' ', '"', ''):    P =       13.000500 skip.
SimpleDialect('#', '', ''): P =       26.065333 skip.
SimpleDialect('#', '"', ''):    P =        0.040000 skip.
SimpleDialect('*', '', ''): P =       93.639500 T =        0.843074 Q =       78.945071
SimpleDialect('*', '"', ''):    P =        0.040000 skip.
SimpleDialect('-', '', ''): P =       39.078500 skip.
SimpleDialect('-', '"', ''):    P =        0.040000 skip.
SimpleDialect(':', '', ''): P =       21.732000 skip.
SimpleDialect(':', '"', ''):    P =        9.750500 skip.
SimpleDialect('_', '', ''): P =        0.406000 skip.
SimpleDialect('_', '"', ''):    P =        0.269500 skip.

CSV file attached. csv_good_dialect_star.csv

Link to Google Sheets: https://docs.google.com/spreadsheets/d/1pbU8Fe0h-NvCc5Cxxbg_nonJgYZB4mHdHNsrmva57CE/edit?usp=sharing