EPPlusSoftware / EPPlus

EPPlus-Excel spreadsheets for .NET
https://epplussoftware.com
Other
1.82k stars 277 forks source link

File get corrupted after reading and saving #1691

Open Schartoym opened 1 day ago

Schartoym commented 1 day ago

EPPlus usage

Personal use

Environment

MacOs

Epplus version

7.5.0

Spreadsheet application

Excel

Description

Hello! I've encountered problems when reading file using EPP saving it and trying to read again.

Steps to reproduce

  1. Read example file with EPP
  2. Save file
  3. Try read saved file

Sample code


using OfficeOpenXml;

ExcelPackage.LicenseContext = LicenseContext.NonCommercial;

Console.WriteLine("Reading example.xlsx");
using var package = new ExcelPackage();
await package.LoadAsync(new FileInfo("example.xlsx"));
Console.WriteLine("Worksheets count: " + package.Workbook.Worksheets.Count);
await package.SaveAsAsync("example_saved.xlsx");
Console.WriteLine("Saved as example_saved.xlsx");

Console.WriteLine("=======");
Console.WriteLine("Try read example_saved.xlsx");
using var anotherPackage = new ExcelPackage();
await anotherPackage.LoadAsync(new FileInfo("example_saved.xlsx"));
Console.WriteLine("Worksheets count: " + anotherPackage.Workbook.Worksheets.Count);

Got exception

Unhandled exception. System.Xml.XmlException: Name cannot begin with the '"' character, hexadecimal value 0x22. Line 1, position 236.
   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   at System.Xml.XmlTextReaderImpl.ParseAttributes()
   at System.Xml.XmlTextReaderImpl.ParseElement()
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at OfficeOpenXml.Style.ExcelRichText.ReadrPr(XmlReader xr)
   at OfficeOpenXml.Style.ExcelRichText..ctor(XmlReader xr, ExcelRichTextCollection collection)
   at OfficeOpenXml.Style.ExcelRichTextCollection..ctor(XmlReader xr, ExcelWorkbook wb)
   at OfficeOpenXml.ExcelWorkbook.GetSharedStrings()
   at OfficeOpenXml.ExcelWorkbook..ctor(ExcelPackage package, XmlNamespaceManager namespaceManager)
   at OfficeOpenXml.ExcelPackage.get_Workbook()
   at Program.<Main>$(String[] args) in /Users/artem/dev/EppBug/EppBug/Program.cs:line 17
   at Program.<Main>(String[] args)

Example files:

Sample project:

I suppose this bug is connected to insufficient escape values for rich text. Take a look at xl/sharedStrings.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?><sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="4" uniqueCount="4"><si><t>Column 1</t></si><si><t>Column 2</t></si><si><r><rPr><rFont val="ALS Hauss""/><color rgb="FF000000"/><sz val="12"/></rPr><t>760805883660</t></r></si><si><r><rPr><rFont val="ALS Hauss""/><color rgb="FF000000"/><sz val="12"/></rPr><t>771005865941</t></r></si></sst>

This is bad formatted XML. Root cause is font name: ALS Hauss". Excel saves it as ALS Hauss&quote;

JanKallman commented 12 hours ago

Yes, adding double quotes to font names in rich text fails as the font name is not correctly encoded, as you say. I will provide a fix.