CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
The following file is from Google Sheets. In one column there is a markdown formatted multiline text. The problem is that CleverCSV detects this file wrong, braking it.
Essentially it select this super weird star(*) based delimiter, which then breaks the whole file.
Running normal form detection ...
Not normal, has potential escapechar.
Running data consistency measure ...
SimpleDialect(',', '', ''): P = 14.309419 T = 0.672613 Q = 9.624698
SimpleDialect(',', '', '/'): P = 14.268794 T = 0.615974 Q = 8.789203
SimpleDialect(',', '"', ''): P = 37.647059 T = 0.942647 Q = 35.487889
SimpleDialect(',', '"', '/'): P = 18.751838 skip.
SimpleDialect('', '', ''): P = 0.313000 skip.
SimpleDialect('', '"', ''): P = 0.040000 skip.
SimpleDialect(' ', '', ''): P = 45.500250 T = 0.332927 Q = 15.148254
SimpleDialect(' ', '"', ''): P = 13.000500 skip.
SimpleDialect('#', '', ''): P = 26.065333 skip.
SimpleDialect('#', '"', ''): P = 0.040000 skip.
SimpleDialect('*', '', ''): P = 93.639500 T = 0.843074 Q = 78.945071
SimpleDialect('*', '"', ''): P = 0.040000 skip.
SimpleDialect('-', '', ''): P = 39.078500 skip.
SimpleDialect('-', '"', ''): P = 0.040000 skip.
SimpleDialect(':', '', ''): P = 21.732000 skip.
SimpleDialect(':', '"', ''): P = 9.750500 skip.
SimpleDialect('_', '', ''): P = 0.406000 skip.
SimpleDialect('_', '"', ''): P = 0.269500 skip.
The following file is from Google Sheets. In one column there is a markdown formatted multiline text. The problem is that CleverCSV detects this file wrong, braking it.
Essentially it select this super weird star(*) based delimiter, which then breaks the whole file.
CSV file attached. csv_good_dialect_star.csv
Link to Google Sheets: https://docs.google.com/spreadsheets/d/1pbU8Fe0h-NvCc5Cxxbg_nonJgYZB4mHdHNsrmva57CE/edit?usp=sharing