Closed devoidfury closed 3 years ago
Related issue: #876
Sequence for the w:lvl
element:
<xsd:complexType name="CT_Lvl">
<xsd:sequence>
<xsd:element name="start" type="CT_DecimalNumber" minOccurs="0"/>
<xsd:element name="numFmt" type="CT_NumFmt" minOccurs="0"/>
<xsd:element name="lvlRestart" type="CT_DecimalNumber" minOccurs="0"/>
<xsd:element name="pStyle" type="CT_String" minOccurs="0"/>
<xsd:element name="isLgl" type="CT_OnOff" minOccurs="0"/>
<xsd:element name="suff" type="CT_LevelSuffix" minOccurs="0"/>
<xsd:element name="lvlText" type="CT_LevelText" minOccurs="0"/>
<xsd:element name="lvlPicBulletId" type="CT_DecimalNumber" minOccurs="0"/>
<xsd:element name="lvlJc" type="CT_Jc" minOccurs="0"/>
<xsd:element name="pPr" type="CT_PPrGeneral" minOccurs="0"/>
<xsd:element name="rPr" type="CT_RPr" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="ilvl" type="ST_DecimalNumber" use="required"/>
<xsd:attribute name="tplc" type="ST_LongHexNumber" use="optional"/>
<xsd:attribute name="tentative" type="s:ST_OnOff" use="optional"/>
</xsd:complexType>
I made quite a bit of progress on this, here: https://github.com/dolanmiu/docx/compare/master...devoidfury:bug/ooxml-conformance-fixes
The main errors I'm getting that I don't know how to handle:
Invalid EDIT: this has been removed on my branch.mirrorMargins
attribute on w:pgMar
Invalid element EDIT: this has been removed in my branch.w:shdCs
(couldn't find a reference for these anywhere -- should it just be deleted? Looks like w:shd
does everything here)
This is written about here, and it's a commonly used attribute among various XML document types: http://www.wordarticles.com/Articles/Formats/OOXML/OOXML.phpw:document
has an invalid attribute mc:Ignorable="w14 w15 wp14"
, couldn't find a reference or documentation for this property anywhere.
Amazing
Happy to make this part of the CI process to validate the schema once this is done
Yes, if w:shdCs
not in the spec, or not anywhere, it can be removed
Here's the validator setup I'm using in the meantime -- I want to make a JS version, but this is the quick and dirty solution that's enabled me to validate the documents against the schema.
Would it be helpful to inline these xsd types as comments? So that we have a reference to the valid attributes/children/sequences right in line?
I think so yes, adding these xsd types is invaluable
@devoidfury I am adding it into GitHub Actions
Thank you for your research into this area
The checks are based on the same OOXML schemas on your docx-validator project:
EDIT: The actionable thing to do here is add a javascript validator against one of the wml.xsd schemas.
==========
Hey there! Great library, I've been using it a while and trying to help out a little where I can.
An issue I've come across is that it's really easy to generate a corrupted document, and tricky to pinpoint exactly where and why this happens. It's not the fault of this library, and to be honest there aren't any good options for validating these XML documents in javascript (I'm working on this -- soon I hope to have a validator with specific error messages in pure js!).
I've set up a hacky tool to validate these documents locally on linux with libxml/xmllint -- I'll share that setup once I write a little wrapper around it -- and I've noticed that it spits out a ton of errors. One of the most common errors is that, in several places in the spec, there's a specific sequence of nodes expected in order to conform. Nodes being out of order mostly works, but I suspect it's caused some bugs!
See for example, the schema for the base abstract type used under
w:pPr
- notice the<xsd:sequence>
- this means they must be in this specific order to conform.Sadly, this is not documented anywhere in the officeopenxml.com site, and is only found in the ECMA-376 reference schemas (see for example, ECMA-376 fifth edition, part one, page 3839, containing a version of the above element type).
https://www.ecma-international.org/publications-and-standards/standards/ecma-376/