alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.24k stars 70 forks source link

Performance improvements #92

Closed GjjvdBurg closed 1 year ago

GjjvdBurg commented 1 year ago

This PR adds performance improvements in two ways:

Especially for large files, this will likely make a significant difference to the performance of CleverCSV. Some statistics(*) on our integration tests:

Also, this PR fixes the documentation error reported in #91.


*: one file (13a6c86a18f053c593feda3d98755010) was discarded from the comparison because before these improvement dialect detection would timeout, so it wasn't included previously.