JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.71k stars 1.06k forks source link

Improperly escaped quotes in CSV not being detected, remaining record fields mismapped #1338

Open jasonchester opened 5 years ago

jasonchester commented 5 years ago

When parsing this misquoted record, BadDataFound is not being invoked.

The field with the issue is: "RUSSELL "HILLS," "

data

Owner,Store_Name,Store_Num,Contact_Name,Address,Address2,City,State_Cd,Postal_Cd,Country,Store_Phone_Number
"28941","SOMESTORE & HILLS INC                   ","28941","RUSSELL "HILLS,"     ","123 Main St                             ","                                        ","Springfield                                 ","NY","12345-1111","US","(781) 555-1111"

Result

Key                Value
---                -----
Owner              28941
Store_Name         SOMESTORE & HILLS INC
Store_Num          28941
Contact_Name       RUSSELL HILLS
Address
Address2           123 Main St
City
State_Cd           Springfield
Postal_Cd          NY
Country            12345-1111
Store_Phone_Number US

Expected Result This row should be flagged as Bad and I should be able to handle it using BadDataFound

AltruCoder commented 5 years ago

I believe this is the code that controls the observed behavior. After the parser reads the second quote, it continues to read the rest of the field like a normal (unquoted) field. I agree with @jasonchester that the desired behavior would be to see this as bad data if it is not hitting either a delimiter or an escape quote after reaching the second quote. https://github.com/JoshClose/CsvHelper/blob/2914f6856febbf7488c53ed65948a00a39f95a22/src/CsvHelper/CsvParser.cs#L695-L700 rfc4180 seems to indicate that you can't have a partially quoted field.

  1. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.