JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.72k stars 1.06k forks source link

CsvWriter WriteRecord<T> is stripping out some inner double quote (\") charcters #114

Closed JordanMarr closed 11 years ago

JordanMarr commented 11 years ago

I have the following column string value written out correctly in a log file: "Filters: "528294""

When GetRecords rebuilds my T objects, that property has a value of: "Filters: 528294""

Note the missing double quote before 528294.

This bug exists with version 1.14.0.39378 and I know it also existed in version 1.12.1. I am using the following CsvConfiguration both when reading and writing:

var config = new CsvHelper.Configuration.CsvConfiguration(); config.Delimiter = "\t"; config.ClassMapping(); config.QuoteAllFields = true;

JoshClose commented 11 years ago

Shouldn't the value be "Filters: ""528924""" in the CSV file? I might be confused on what the field value is here.

GraemeF commented 11 years ago

Yeah, looks like the log file is wrong to me.

JordanMarr commented 11 years ago

So the consensus seems to be that the value should have been written out to the log as "Filters: ""528924""".

The question then becomes, should the onus be on the consumer of the CsvHelper library to add the extra double quotes, or should it be on the CsvHelper to detect that the value contains inner quotes and then ensure that they are formatted correctly?

My preference is that CsvHelper handles this transparently on behalf of the user, since it is a CSV specific implementation detail. If not by default, then maybe there could at least be a configuration option to take care of this automatically for the user.

GraemeF commented 11 years ago

Ah - it wasn't clear that the log file was written with CsvHelper. Yes, I'd expect that to be properly encoded when written...

JoshClose commented 11 years ago

Was the log file written with CsvHelper? If so, this looks like it would be a bug in the writer.

The reader is reading it properly. Try opening it up in Excel and you should see the same results. But if the writer generated that, there is a problem.

JordanMarr commented 11 years ago

Yes, it was. I will see if I can reproduce it in a unit test.

JoshClose commented 11 years ago

Great, thanks. That would be very helpful.

JordanMarr commented 11 years ago

I was able to recreate the problem in this unit test. Note that this is 1.12.1.41166. I can run it again using the latest version as well and then post results.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MbUnit.Framework;
using System.IO;
namespace CsvHelperTests
{
    [TestFixture]
    public class CsvHelperTests
    {
        [Test]
        public void ShouldPreserveInnerDoubleQuotes()
        {
            FileInfo csvFile = new FileInfo("csvfile.csv");
            if (csvFile.Exists)
            {
                csvFile.Delete();
            }
            var config = PersonCsvMap.CreateConfig();

            Person p = new Person
            {
                FullName = "John Doe",
                Nickname = "John \"The Ladies Man\" Doe"
            };
            // Write CSV
            using (var streamWriter = File.AppendText(csvFile.FullName))
            using (var csvWriter = new CsvHelper.CsvWriter(streamWriter, config))
            {
                csvWriter.WriteRecord(p);
            }
            // Read CSV
            var people = new List();
            using (var streamReader = new System.IO.StreamReader(csvFile.FullName))
            using (var csvReader = new CsvHelper.CsvReader(streamReader, config))
            {
                foreach (Person person in csvReader.GetRecords())
                {
                    people.Add(person);
                }
            }
            Assert.AreEqual(p.Nickname, people[0].Nickname);
        }
    }
    public class Person
    {
        public string FullName { get; set; }
        public string Nickname { get; set; }
    }
    public class PersonCsvMap : CsvHelper.Configuration.CsvClassMap
    {
        public PersonCsvMap()
        {
            Map(m => m.FullName);
            Map(m => m.Nickname);
        }
        public static CsvHelper.Configuration.CsvConfiguration CreateConfig()
        {
            var config = new CsvHelper.Configuration.CsvConfiguration();
            config.ClassMapping();
            config.QuoteAllFields = true;
            return config;
        }
    }
}
JoshClose commented 11 years ago

Awesome. I will try and get this fixed today and get a new NuGet version out.

JordanMarr commented 11 years ago

I updated the test to use v1.14.0.39378 (added a call to WriteHeader() and modified config file to HasHeaderRecord = true). It still produces the bug. Here is the updated test:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MbUnit.Framework;
using System.IO;
namespace CsvHelperTests
{
    [TestFixture]
    public class CsvHelperTests
    {
        [Test]
        public void ShouldPreserveInnerDoubleQuotes()
        {
            FileInfo csvFile = new FileInfo("csvfile.csv");
            if (csvFile.Exists)
            {
                csvFile.Delete();
            }
            var config = PersonCsvMap.CreateConfig();

            Person p = new Person
            {
                FullName = "John Doe",
                Nickname = "John \"The Ladies Man\" Doe"
            };
            // Write CSV
            using (var streamWriter = File.AppendText(csvFile.FullName))
            using (var csvWriter = new CsvHelper.CsvWriter(streamWriter, config))
            {
                csvWriter.WriteHeader();
                csvWriter.WriteRecord(p);
            }
            // Read CSV
            var people = new List();
            using (var streamReader = new System.IO.StreamReader(csvFile.FullName))
            using (var csvReader = new CsvHelper.CsvReader(streamReader, config))
            {
                foreach (Person person in csvReader.GetRecords())
                {
                    people.Add(person);
                }
            }
            Assert.AreEqual(p.Nickname, people[0].Nickname);
        }
    }
    public class Person
    {
        public string FullName { get; set; }
        public string Nickname { get; set; }
    }
    public class PersonCsvMap : CsvHelper.Configuration.CsvClassMap
    {
        public PersonCsvMap()
        {
            Map(m => m.FullName);
            Map(m => m.Nickname);
        }
        public static CsvHelper.Configuration.CsvConfiguration CreateConfig()
        {
            var config = new CsvHelper.Configuration.CsvConfiguration();
            config.HasHeaderRecord = true;
            config.ClassMapping();
            config.QuoteAllFields = true;
            return config;
        }
    }
}
jeffld commented 11 years ago

This issue also interests me.

I've noticed that different CSV parsers handle quotes differently. Some parsers use quotes as text delimiters, to allow for commas in the data. Such as the parser used by Microsoft Excel for example.

I have not tested this program yet,i've just recently began watching.

I'm curious to see how the program handles commas in the data. How does the program determine the difference between a data comma, and a delimiting comma.

On Sat, Mar 2, 2013 at 12:31 PM, Jordan Marr notifications@github.comwrote:

I updated the test to use v1.14.0.39378 (added a call to WriteHeader() and modified config file to HasHeaderRecord = true). It still produces the bug. Here is the updated test:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using MbUnit.Framework; using System.IO;

namespace CsvHelperTests { [TestFixture] public class CsvHelperTests { [Test] public void ShouldPreserveInnerDoubleQuotes() { FileInfo csvFile = new FileInfo("csvfile.csv"); if (csvFile.Exists) { csvFile.Delete(); }

        var config = PersonCsvMap.CreateConfig();

        Person p = new Person
        {
            FullName = "John Doe",
            Nickname = "John \"The Ladies Man\" Doe"
        };

        // Write CSV
        using (var streamWriter = File.AppendText(csvFile.FullName))
        using (var csvWriter = new CsvHelper.CsvWriter(

streamWriter, config)) { csvWriter.WriteHeader(); csvWriter.WriteRecord(p); }

        // Read CSV
        var people = new List();
        using (var streamReader = new System.IO.StreamReader(csvFile.FullName))
        using (var csvReader = new CsvHelper.CsvReader(streamReader, config))
        {
            foreach (Person person in csvReader.GetRecords())
            {
                people.Add(person);
            }
        }

        Assert.AreEqual(p.Nickname, people[0].Nickname);
    }
}

public class Person
{
    public string FullName { get; set; }
    public string Nickname { get; set; }
}

public class PersonCsvMap : CsvHelper.Configuration.CsvClassMap
{
    public PersonCsvMap()
    {
        Map(m => m.FullName);
        Map(m => m.Nickname);
    }

    public static CsvHelper.Configuration.CsvConfiguration CreateConfig()
    {
        var config = new CsvHelper.Configuration.CsvConfiguration();
        config.HasHeaderRecord = true;
        config.ClassMapping();
        config.QuoteAllFields = true;
        return config;
    }
}

}

Reply to this email directly or view it on GitHubhttps://github.com/JoshClose/CsvHelper/issues/114#issuecomment-14332794 .

JoshClose commented 11 years ago

It follows RFC 4180 http://tools.ietf.org/html/rfc4180. Excel also does some extra things to parse malformed CSV files. I went through and made CsvHelper parse malformed CSVs like Excel does, so they should work the same.

I have some unit tests around this specifically. https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper.Tests/CsvParserExcelCompatibleTests.cs

GraemeF commented 11 years ago

To be clear, this issue is now about the writer not the reader? Could do with having the title changed :smile:

JordanMarr commented 11 years ago

Agreed! The title has been updated from reader to writer.

JoshClose commented 11 years ago

Fix committed to source. https://github.com/JoshClose/CsvHelper/commit/66fbc9c329cb93ab3e7942c9da82900dfb7ca1a7

JoshClose commented 11 years ago

Version 1.15.0 is on NuGet.

GraemeF commented 11 years ago

Excellent - thanks Josh, we just ran into this after you pushed the fix! Will update now. Perfect timing :wink:

JoshClose commented 11 years ago

Good! Sorry it took so long. I (my wife) just had a baby, then I got the flu right after, so I've been out of commission for while. :)

JordanMarr commented 11 years ago

Congrats on the baby! (and the fix looks good on my end)

JoshClose commented 11 years ago

Thanks! And I'm glad this solved your issue.