JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.63k stars 1.05k forks source link

Whitespace after delimiter can result in read error #2260

Closed akamor closed 1 month ago

akamor commented 1 month ago

When there is a whitespace following a delimiter we get an exception IF the fields are enclosed in quotes. Here is a sample CSV which repros the issue:

"my name is adam", "i live in america"

Here is a quick snippet where I read the file using v32.0.3

using System.Globalization;
using System.Text;
using CsvHelper;
using CsvHelper.Configuration;

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    Delimiter = ",",
    Quote = '"',
    Escape = '"',
    HasHeaderRecord = false,
};

var ms = new MemoryStream();
var sw = new StreamWriter(ms, Encoding.UTF8);
sw.WriteLine("\"my name is adam\", \"i live in america\"");
sw.Flush();
ms.Position = 0;

var csvReader = new CsvReader(new StreamReader(ms, Encoding.UTF8), config);

while (await csvReader.ReadAsync().ConfigureAwait(false))
{
    var columnCount = csvReader.Parser.Count;
    for (int i = 0; i < columnCount; i++)
    {
        var val = csvReader.GetField(i);
        Console.WriteLine($"Reading val: {val}");
    }
}

If I remove the whitepsace between the fields the error goes away and can read both values correctly. With the above failure the first column is read correctly and we receive the following error on the second column.

Unhandled exception. CsvHelper.BadDataException: You can ignore bad data by setting BadDataFound to null.
IReader state:
   ColumnCount: 2
   CurrentIndex: 1
   HeaderRecord:

IParser state:
   ByteCount: 0
   CharCount: 40
   Row: 1
   RawRow: 1
   Count: 2
   RawRecord:
JoshClose commented 1 month ago

That's not a valid CSV file.

You can handle this by setting the trim option TrimOptions = TrimOptions.Trim

akamor commented 1 month ago

Got it. I suspected that was the situation but thanks for confirming.

I've noticed that Excel seems to almost always open CSVs (even invalid ones). For whatever reason, it is just very relaxed in the set of inputs it can accept. Is there any documentation on the collection of config options we should employ via your SDK to essentially accept even malformed csvs, or said another way how can we replicate excel's behavior?

akamor commented 1 month ago

And also, in case we dont talk again let me say thanks a lot for your support of csv helper. were huge fans.

JoshClose commented 1 month ago

There is no set of things. Typically, you want to know when things aren't working as expected.

Setting TrimOptions = TrimOptions.Trim | TrimOptions.InsideQuotes should be pretty safe. You can set what is trimmed via the WhiteSpaceChars setting. By default it's just a space character.

You can change the behavior of BadDataFound to just ignore the issue with BadDataFound = null, but depending on the reason, it could mess up the rest of the read. You might want to log when this happens or something.