dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
3.99k stars 544 forks source link

Why office365 xlsx sheet utf8 encoding but openxmlsdk is utf8-withbom #1064

Open shps951023 opened 2 years ago

shps951023 commented 2 years ago

Before submitting an issue, please fill this out

Is this a:

If you have answered that this is a question, please ask it on StackOverflow instead of here. This issue tracker is meant to track product issues while StackOverflow excels at answering questions

---------------- Remove this line and above before posting ----------------

Description

Please provide a simple description of the issue encountered.

Information

Repro

public class Program
{
    public static void Main(string[] args)
    {
        var directoryInfo = new DirectoryInfo(Directory.GetCurrentDirectory());
        var fileName = $@"PracticePart1-{DateTime.Now:yyyyMMddHHmmss}.xlsx";
        var filepath = Path.Combine(directoryInfo.ToString(), fileName);
        Console.WriteLine($"FilePath: {filepath}");
        var spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook);
        var workbookPart = spreadsheetDocument.AddWorkbookPart();
        workbookPart.Workbook = new Workbook();
        var worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
        worksheetPart.Worksheet = new Worksheet(new SheetData());
        var sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
        var sheet = new Sheet()
        {
            Id = spreadsheetDocument.WorkbookPart.GetIdOfPart(worksheetPart),
            SheetId = 1,
            Name = "myFirstSheet"
        };
        sheets.Append(sheet);
        workbookPart.Workbook.Save();
        spreadsheetDocument.Close();
    }
}

Observed

Office 365 encoding are all utf-8 without bom, but openxmlsdk some're utf-8 with bom and some not image

Expected

Should we follow office365 encoding standard? (below image is office 365 xlsx)

office365_sample.xlsx image

twsouthwick commented 2 years ago

We changed it to that as it was causing some renderers to have problems (see https://github.com/OfficeDev/Open-XML-SDK/issues/309).

I'm not certain if there's a specific encoding is required by the spec, but we could potentially enable it to be configurable rather than relying on a specific default.

shps951023 commented 2 years ago

but we could potentially enable it to be configurable rather than relying on a specific default.

@twsouthwick Thanks! it will be helpful feature.

twsouthwick commented 2 years ago

Happy to accept PRs. Probably best to add it to the OpenSettings object

tomjebo commented 2 years ago

@shps951023 is there a good reason why the SDK should emit non-BOM UTF-8? From our Office apps team, it appears that we don't have any requirement either way, i.e. Office apps will read UTF-8 BOM parts just fine. Does your code depend on non-BOM UTF-8?

shps951023 commented 2 years ago

@tomjebo So sorry about long time to see notification and to reply!! Some Chinese users need to custom encoding to read non-UTF8.

@twsouthwick Thanks, I'll try it

Happy new year! Wish everyone having a great new year.