JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.76k stars 1.07k forks source link

ConfigurationFunctions.ShouldQuote should not check for CsvConfiguration.Delimiter when CsvConfiguration.DetectDelimiter is true #1991

Open Jinjinov opened 2 years ago

Jinjinov commented 2 years ago

Describe the bug ConfigurationFunctions.ShouldQuote should not check for CsvConfiguration.Delimiter when CsvConfiguration.DetectDelimiter is true

To Reproduce Steps to reproduce the behavior:

  1. use CsvConfiguration config = new CsvConfiguration(CultureInfo.InvariantCulture){DetectDelimiter = true};
  2. use using (CsvReader csv = new CsvReader(streamReader, config))
  3. read a TSV file that is not quoted and contains a comma in the second column of the second row
  4. when using CsvWriter to write the content back to a TSV, ConfigurationFunctions.ShouldQuote doesn't check for the detected delimiter \t but the default ,

Expected behavior ConfigurationFunctions.ShouldQuote should check for the detected delimiter \t instead of the default , so that the TSV generated with CsvWriter is identical to the original TSV read with CsvReader.

JoshClose commented 2 years ago

ShouldQuote is only used when writing. DetectDelimiter is only used when reading. There is something else going on for you.

Can you give an example that shows this? Here is what I have based on your reproduction steps.

void Main()
{
    var s = new StringBuilder();
    s.Append("Id\tName\r\n");
    s.Append("1\tone\r\n");
    s.Append("2\t,two\r\n");
    s.Append("3\tthree\r\n");
    var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        Delimiter = "\t",
        DetectDelimiter = true,
    };
    using (var reader = new StringReader(s.ToString()))
    using (var csv = new CsvReader(reader, config))
    {
        csv.GetRecords<Foo>().ToList().Dump();
    }
}

private class Foo
{
    public int Id { get; set; }
    public string Name { get; set; }
}

image

Jinjinov commented 2 years ago

Here is what I have based on your reproduction steps.

You didn't follow my reproduction steps.

new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        Delimiter = "\t",
        DetectDelimiter = true,
    };

is not the same as new CsvConfiguration(CultureInfo.InvariantCulture){DetectDelimiter = true};

If I use Delimiter = "\t" then it also works for me. But I don't know the delimiter in advance.

It has been 3 months since I was using this, so I don't remember the problem exactly.

JoshClose commented 2 years ago

Oops. Just remove the Delimiter = "\t" line. Same result.

The next release that will be coming out probably today will have delimiter detection fixes. If these don't fix your issue, please post here and I can open this again.

Jinjinov commented 2 years ago

I forgot what the original problem was, but I looked into the source code just now.

The problem is in public static bool ShouldQuote(ShouldQuoteArgs args) https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper/Configuration/ConfigurationFunctions.cs#L103 in line || (config.Delimiter.Length > 0 && args.Field.Contains(config.Delimiter)) // Contains delimiter

Because I used DetectDelimiter = true, the detected delimiter was \t, but line 103 does not use the detected delimiter, but the default delimiter config.Delimiter - that causes a problem when writing the content back.

I don't remember what I was doing, but I guess I was trying to use a method that calls ShouldQuote.

My guess would be that I was trying to use public CsvWriter(TextWriter writer, CsvConfiguration configuration) that calls ShouldQuote in shouldQuote = configuration.ShouldQuote;

So the actual problem comes when writing this - the output TSV is not identical to the input TSV.

I am sorry that I was not specific enough.

Jinjinov commented 2 years ago

@JoshClose I updated the original comment.

JoshClose commented 2 years ago

Oh... You read the data then use that same config to write and are expecting the delimiter from the detection to be used.

The problem here is that IParserConfiguration is passed into CsvParser where the delimiter detection is called and it's readonly, so we need to think if a different way to handle this. CsvReader.Parser.Delimiter is the updated value. After reading, you could set this value to the configuration before you pass it into the CsvWriter constructor.

Here is an example.

void Main()
{
    var s = new StringBuilder();
    s.Append("Id;Name\r\n");
    s.Append("1;one\r\n");
    List<Foo> records;
    var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        DetectDelimiter = true,
    };
    using (var reader = new StringReader(s.ToString()))
    using (var csv = new CsvReader(reader, config))
    {
        csv.Context.RegisterClassMap<FooMap>();
        records = csv.GetRecords<Foo>().ToList();
        config.Delimiter = csv.Parser.Delimiter;
    }

    using (var writer = new StringWriter())
    using (var csv = new CsvWriter(writer, config))
    {
        csv.WriteRecords(records);
        csv.Flush();
        writer.Dump();
    }
}

private class Foo
{
    public int Id { get; set; }
    public string Name { get; set; }
}

private class FooMap : ClassMap<Foo>
{
    public FooMap()
    {
        Map(m => m.Id);
        Map(m => m.Name);
    }
}

image

Jinjinov commented 2 years ago

I see. Thank you!

I remember now that I was trying to read a TSV and write it back the same way, without quotes and with automatically detected delimiter, but the generated TSV always had quotes for all the cells with a comma in the middle, even though the original didn't have quotes.

Perhaps reading a TSV and writing it back the same way could be explained in the Readme :)

JoshClose commented 2 years ago

It actually has nothing to do with a TSV. It will happen with any delimiter. Delimiter detection is the issue.

I believe this is the first time I've heard of someone needing to write a file in the same way it was ready. It's usually the other way around.

I'll mark this as documentation.