alan-turing-institute / CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
https://clevercsv.readthedocs.io
MIT License
1.25k stars 72 forks source link

Python csv does a better job with escape characters and quotes than CleverCSV #130

Closed seperman closed 1 week ago

seperman commented 3 months ago

Hello, First of all, thank you for CleverCSV. I use it mainly as a replacement of Python csv. However, I noticed that Python's csv does a better job at handling escape characters and quoes:

Please consider the following:

import clevercsv
import csv
from io import StringIO

data = """sku,features,attributes
22221,"[{""key"":""heel_height"",""value"":""Ulttra High (4\\""+)""}]","11,room"
"""

print("Python csv")

stream = StringIO(data)

reader = csv.reader(stream, delimiter=',', quotechar='"', escapechar='\\')

for row in reader:
    print(row)

# ---------------

print("clever csv")

stream = StringIO(data)

for row in clevercsv.reader(stream, delimiter=',', quotechar='"', escapechar='\\'):
    print(row)

This will print:

Python csv
['sku', 'features', 'attributes']
['22221', '[{"key":"heel_height","value":"Ulttra High (4"+)""}]"', '11,room']
clever csv
['sku', 'features', 'attributes']
['22221', '"[{"key":"heel_height","value":"Ulttra High (4""+)""}]","11', 'room"\n']

Clever CSV splits the line in the wrong place. It also convert the \" into "" which is not correct.

GjjvdBurg commented 1 week ago

Hi @seperman thanks for opening this issue. I've taken a look and I think in this particular case the problem is with the dialect, not the parsing. If we don't set the escape character then the text is parsed correctly, and I've added a test in 4b4082a to illustrate this. Please reopen the issue if that doesn't resolve your problem.