JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.75k stars 1.06k forks source link

Encoding with BOM(Byte Order Mark) #283

Closed Austin-Liang closed 10 years ago

Austin-Liang commented 10 years ago

Hi All,

I have try something like this:

var utf8_Bom = new System.Text.UTF8Encoding(true); // true to use bom.

and set it to:

using (var memoryStream = new MemoryStream()) { using (var streamWriter = new StreamWriter(memoryStream, utf8_Bom)) using (var csvWriter = new CsvWriter(streamWriter)) { // write csv bytes csvWriter.Configuration.Encoding = utf8_Bom; csvWriter.WriteRecords(records.ToList()); }

return memoryStream.ToArray();

}

but when I download csv file and open it with excel, it shows unreadable character encoding. ( I have Chinese in my data )

then I use emeditor to open csv file to check encoding, it shows UTF-8 without BOM.

so, I just save it again with UTF-8(BOM) option and open it with excel, this time everything works perfectly.

It looks [ new System.Text.UTF8Encoding(true); ] doesn't work with CsvHelper properly.

Is there any workaround about this problem?

Thanks in advance.

Austin Liang.

JoshClose commented 10 years ago

Are you able to do it if you don't use CsvHelper? Try getting it working without CsvHelper in the middle, then put it in after. On Jul 3, 2014 10:08 PM, "Austin Liang" notifications@github.com wrote:

Hi All,

I have try something like this:

var utf8_Bom = new System.Text.UTF8Encoding(true); // true to use bom.

and set it to:

using (var memoryStream = new MemoryStream()) { using (var streamWriter = new StreamWriter(memoryStream, utf8_Bom)) using (var csvWriter = new CsvWriter(streamWriter)) { // write csv bytes csvWriter.Configuration.Encoding = utf8_Bom; csvWriter.WriteRecords(records.ToList()); }

return memoryStream.ToArray();

}

but when I download csv file and open it with excel, it shows unreadable character encoding. ( I have Chinese in my data )

then I use emeditor to open csv file to check encoding, it shows UTF-8 without BOM.

so, I just save it again with UTF-8(BOM) option and open it with excel, this time everything works perfectly.

It looks [ new System.Text.UTF8Encoding(true); ] doesn't work with CsvHelper properly.

Is there any workaround about this problem?

Thanks in advance.

Austin Liang.

— Reply to this email directly or view it on GitHub https://github.com/JoshClose/CsvHelper/issues/283.

lnu commented 10 years ago

Hello,

I had the same problem and it works with that:

using (var streamWriter = new StreamWriter(stream, System.TextEncoding.UTF8))
{
}

The bom is generated as expected.

robdmoore commented 10 years ago

I suspect this is the problem: http://stackoverflow.com/questions/4414088/how-to-getbytes-in-c-sharp-with-utf8-encoding-with-bom/4414118#4414118

robdmoore commented 10 years ago

It would be useful if csvhelper had an option to insert the bom for you

Austin-Liang commented 10 years ago

Thanks 4 this information, it really helps me a lot!

JoshClose commented 10 years ago

I see this a few times in the SO question. UTF-8 does not require a BOM

CsvHelper doesn't know what encoding you're using. It takes in a TextWriter and just writes to it. You manage all of the details; the stream, the encoding, where it's written to, etc.

This doesn't seem like it's something that CsvHelper should handle.

If you have a good argument to counter this, please let me know.

robdmoore commented 10 years ago

FYI - outputting in Windows-1252 allows special characters, this worked for me:

        private byte[] GenerateByteArray<TCsvMap, TSource>(IEnumerable<TSource> source) where TCsvMap : CsvClassMap
        {
            var encoding = Encoding.GetEncoding("Windows-1252"); // Only encoding that allows special chacters in Mac and Windows Excel
            using (var stream = new MemoryStream())
            using (var streamWriter = new StreamWriter(stream, encoding))
            using (var csvWriter = new CsvWriter(streamWriter, new CsvConfiguration{Encoding = encoding}))
            {
                csvWriter.Configuration.RegisterClassMap<TCsvMap>();
                csvWriter.WriteRecords(source);
                streamWriter.Flush();
                return stream.ToArray();
            }
        }
chucklu commented 3 years ago

Finally I tried with this solution

string fileName;
byte[] data;
Encoding encoding;

fileName = Path.GetTempFileName();
data = new byte[0]; // assume you have a populated byte array!
encoding = Encoding.UTF8;

using (FileStream stream = new FileStream(fileName, FileMode.Create))
{
  using (BinaryWriter writer = new BinaryWriter(stream, encoding))
  {
    writer.Write(encoding.GetPreamble());
    writer.Write(data);
  }
}
chucklu commented 3 years ago

StreamWriter also works fine with a new created file https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L273

daniol commented 10 months ago

I share my code to write a CSV file with BOM:

var utf8_Bom = new UTF8Encoding(true); // UTF with BOM encoding
using var stream = new FileStream("file.csv", FileMode.Create);
using var writer = new StreamWriter(stream, utf8_Bom);
var config = new CsvConfiguration(CultureInfo.CurrentCulture) { Delimiter = ";", Encoding = Encoding.UTF8 };
using var csv = new CsvWriter(writer, config);
csv.WriteRecords(records);