bokuweb / docx-rs

:memo: A .docx file writer with Rust/WebAssembly.
https://bokuweb.github.io/docx-rs/
MIT License
342 stars 60 forks source link

*Empty* document has 4 OOXML Validator errors #724

Open qtfkwk opened 3 months ago

qtfkwk commented 3 months ago

Describe the bug

I've been using this library to generate docx files that work fine in LibreOffice, but was surprised that Microsoft Word reported they needed to be "recovered" which subsequently failed and said they were "corrupt."

Since these files used many features (paragraph and character styles, images, tables, ...) which produced hundreds of issues when using the OOXML Validator VSCode extension (see also: mikeebowen/ooxml-validator-vscode), I tried it against an empty file (as in just creating a docx via this library, adding nothing, and then building and saving it) to see if I was "doing something wrong" (?).

I found that this file had the following 4 errors (so it seemed appropriate to start with those).

Reproduced step

Steps to reproduce the behavior:

1.

   cargo new docx-test
   cd docx-test
   cargo add anyhow docx-rs
   cat <<EOF >src/main.rs
   use anyhow::Result;
   use docx_rs::*;

   fn main() -> Result<()> {
       let docx = Docx::new();
       docx.build().pack(std::fs::File::create("test.docx")?)?;
       Ok(())
   }
   EOF
   cargo run
  1. Open VS Code.
  2. Install the OOXML Validator VSCode extension.
  3. Open the docx-test folder.
  4. Right-click the generated test.docx, select Validate OOXML, wait for validation to complete, click the View Errors button.

For reference, I've attached the test.docx.

Expected behavior

Any generated docx file (whether empty or using any/all features) should open without issue in Microsoft Word and pass a validator.

Actual behavior

The generated test.docx opens in Word without issue, but the OOXML Validator VSCode extension produced 4 errors:

[
  {
    "Id": "Sch_UnexpectedElementContentExpectingComplex",
    "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:pPr'.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:style[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_InvalidElementContentExpectingComplex",
    "Description": "The element has invalid child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rPr'. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:keepNext>.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:style[1]/w:pPr[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_InvalidElementContentExpectingComplex",
    "Description": "The element has invalid child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rPr'. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:keepNext>.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:docDefaults[1]/w:pPrDefault[1]/w:pPr[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_UnexpectedElementContentExpectingComplex",
    "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:zoom'.",
    "Namespaces": {},
    "XPath": "/w:settings[1]",
    "PartUri": "/word/settings.xml",
    "ErrorType": "Schema"
  }
]

Desktop (please complete the following information)

qtfkwk commented 3 months ago

Created a repository to track my testing: https://github.com/qtfkwk/docx-test

Please feel free to use it.

bokuweb commented 3 months ago

@qtfkwk Thanks!!!!!

git-noise commented 3 months ago

For what it's worth, I can open docx-rs generated documents in word without them being seen as corrupted. However, I have seen instances where I had to play with the various part of the library to make sure things were properly declared everywhere - thinking instances of style. numbering or attachments which may need to be adequately "linked/declared". On this front libre office seems a tad more permissive - even if its may end up not rendering things correctly, maybe due to these missing elements.

bokuweb commented 3 months ago

I'll do some investigation.

ImplFerris commented 3 months ago

i can contribute to fix this. it is elements order to be followed as per ooxml.

Update: I have fixed the issue and raised PR https://github.com/bokuweb/docx-rs/pull/735