CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
This PR adds performance improvements in two ways:
Caching the is_potential_escapechar function result
Implementing merge_with_quotechar in C
Especially for large files, this will likely make a significant difference to the performance of CleverCSV. Some statistics(*) on our integration tests:
mean runtime: 0.629 seconds to 0.445 seconds (-29.3%)
median runtime: 18.19 ms to 16.06 ms (-11.7%)
p90 runtime: 0.951 seconds to 0.732 seconds (-23.1%)
Also, this PR fixes the documentation error reported in #91.
*: one file (13a6c86a18f053c593feda3d98755010) was discarded from the comparison because before these improvement dialect detection would timeout, so it wasn't included previously.
This PR adds performance improvements in two ways:
is_potential_escapechar
function resultmerge_with_quotechar
in CEspecially for large files, this will likely make a significant difference to the performance of CleverCSV. Some statistics(*) on our integration tests:
Also, this PR fixes the documentation error reported in #91.
*: one file (13a6c86a18f053c593feda3d98755010) was discarded from the comparison because before these improvement dialect detection would timeout, so it wasn't included previously.