dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.03k stars 547 forks source link

Strict Excel Files Have Invalid int32 Data Types #1142

Open Asbjoedt opened 2 years ago

Asbjoedt commented 2 years ago

Description

Excel files saved as "Strict Open XML" contain a number of int32 validation errors related to hidden percentage values in the schema.

For context of this error see: https://github.com/mikeebowen/OOXML-Validator/issues/6

For possible explanation: "In ISO 29500-1 20.1.8.36 gs (Gradient stops) the pos attribute is defined: Specifies where this gradient stop should appear in the color band. This position is specified in the range [0%, 100%], which corresponds to the beginning and the end of the color band respectively.

So, the SDK's validation (the CLI uses the SDK to get the validation errors) is wrong, but that means the issue is with the SDK and not the validator CLI."

Information

Repro

Use OOXML Validator 1.1 to reproduce the validation errors on any generic Excel spreadsheet you save as "Strict Open XML".

Link to OOXML Validator 1.1: https://github.com/mikeebowen/OOXML-Validator

See this issue for an Excel test file, that you can use to reproduce the errors with: https://github.com/mikeebowen/OOXML-Validator/issues/6

Observed

The following errors in XML are created by OOXML Validator. The creator of OOXML Validator ahve informed me the validation errors stem from a bug in the SDK and have advised me to post the issue here.

See the XML log below.

The attribute 'pos' has invalid value '100%'. The string '100%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '120%'. The string '120%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '63%'. The string '63%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '50%'. The string '50%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '103%'. The string '103%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '90%'. The string '90%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '130%'. The string '130%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '98%'. The string '98%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '0%'. The string '0%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '102%'. The string '102%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '98%'. The string '98%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '150%'. The string '150%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '93%'. The string '93%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '170%'. The string '170%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '95%'. The string '95%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '63%'. The string '63%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'lim' has invalid value '800%'. The string '800%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'lim' has invalid value '800%'. The string '800%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'lim' has invalid value '800%'. The string '800%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '100%'. The string '100%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '78%'. The string '78%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '120%'. The string '120%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '99%'. The string '99%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '50%'. The string '50%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '100%'. The string '100%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '100%'. The string '100%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '110%'. The string '110%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '0%'. The string '0%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '94%'. The string '94%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '102%'. The string '102%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '103%'. The string '103%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '100%'. The string '100%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '81%'. The string '81%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '109%'. The string '109%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '105%'. The string '105%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '50%'. The string '50%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '73%'. The string '73%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '103%'. The string '103%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '105%'. The string '105%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'pos' has invalid value '0%'. The string '0%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '67%'. The string '67%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '105%'. The string '105%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema The attribute 'val' has invalid value '110%'. The string '110%' is not a valid 'Int32' value. DocumentFormat.OpenXml.XmlPath Sch_AttributeValueDataTypeDetailed Schema

Expected

There should be no int32 validation errors when Excel saves spreadsheets in "Strict Open XML" file format.

mikeebowen commented 2 years ago

Thanks @Asbjoedt ,

We'll take look into this and see what it will take to fix.

Asbjoedt commented 2 years ago

Hello

Any update if you will prioritize to fix this issue?

mikeebowen commented 2 years ago

Hi @Asbjoedt ,

After looking into this further, we found that this issue is a bug, but not related to strict mode. If a file is saved using "100pt" for the "pos" attribute, it is valid for both strict and transitional modes, but the SDK validator only accepts integers, i.e. without 'pt'. Unfortunately, it doesn't look like it will be an easy fix and I don't know have a time frame, but I added the bug label and will bring it to my colleagues' attention.

Asbjoedt commented 2 years ago

ok thanks. A motivator: For an archive which wants to archive with Strict conformance to future-proof their collections, it is possible to write validation code to ignore these int32 errors, but it would be nice and great to get rid of the errors all together.

AdMSilvan commented 1 year ago

Thank you @Asbjoedt and @mikeebowen for your work on this issue. We are also an archive looking into the possibility of archiving strict XLSX and were very happy to find your validator. Has there been progress on this issue and the validation of strict XLSX more broadly?

mikeebowen commented 1 year ago

Hi @AdMSilvan, unfortunately the OOXML SDK does not distinguish between strict and transitional. And adding this functionality would mean rewriting significant portions of the codebase, so it is not something we can undertake currently. I'll add the feature-request flag in case anyone is interested in taking this on.