dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.05k stars 547 forks source link

ChangeDocumentType does not fully remove vbaProject reference #618

Open mc2002tii opened 5 years ago

mc2002tii commented 5 years ago

Description

I'm using the sample code at https://docs.microsoft.com/en-us/office/open-xml/how-to-convert-a-word-processing-document-from-the-docm-to-the-docx-file-format to remove macros from a docm file and convert it to docx because we have some filtering software in place that prevents transferring files with macros.

Using that sample code I delete the VbaProjectPart, change the document type, and change the file extension. However, our filtering software identifies the resulting file as corrupt (Word 2016 opens the file just fine though, so it is probably within spec).

When I examine the contents of the .docx file, I notice that the [Content_Types].xml file at the root still contains the following line: <Default ContentType="application/vnd.ms-office.vbaProject" Extension="bin"/>

The VbaProjectPart PartName reference is gone and no other content in the .docx file structure contains any macro components. I think that one line in [Content_Types].xml is enough to trip up our scanner.

Is there some other way to get rid of this line that I'm missing, is this a bug, or is this structure just something that our scanning software should accept?

Information

Repro

        bool fileChanged = false;

        using (WordprocessingDocument document = WordprocessingDocument.Open(sourcePath, true))
        {
            // Access the main document part.
            var docPart = document.MainDocumentPart;

            // Look for the vbaProject part. If it is there, delete it.
            var vbaPart = docPart.VbaProjectPart;
            if (vbaPart != null)
            {
                // Delete the vbaProject part and then save the document.
                docPart.DeletePart(vbaPart);
                docPart.Document.Save();

                // Track that the document has been changed.
                fileChanged = true;
            }

            // Change the document type to not macro-enabled
            document.ChangeDocumentType(WordprocessingDocumentType.Document);
        }

        if (fileChanged)
        {
            // If it already exists, it will be deleted!
            if (File.Exists(destinationPath))
                {
                    File.Delete(destinationPath);
                }

            // Rename the file and save changes
            Directory.CreateDirectory(destinationDirectory);
            File.Move(sourcePath, destinationPath);
        }

Observed

file.docx [Content_Types].xml still contains a macro reference.

Expected file.docx should not contain any references to macros.

waizui commented 4 years ago

same issue here is there any solutions?

twsouthwick commented 4 years ago

@mc2002tii Can you include something I can repro?

mc2002tii commented 4 years ago

@mc2002tii Can you include something I can repro?

I'll have to test this when I'm back in the office next week. I couldn't reproduce it today, but at home I have a completely different environment (O365 vs Word 2016, Mac vs. Windows). I know I could still reproduce it with .NET Core 3.1 and DocumentFormat.OpenXML 2.10, but I don't think I tried again when 2.11 came out.

prudhvi2050 commented 2 years ago

I am facing this issue with .Net 6 DocumentFormat.OpenXML 2.16

AlfredHellstern commented 10 months ago

related to issue 1551

tomjebo commented 10 months ago

1551