Docx2Src / Serialize.OpenXml.CodeGen

.NET assembly class responsible for converting OpenXml based documents into corrisponding dotnet code
MIT License
39 stars 6 forks source link

Generated Code incompatible with DocumentFormat.OpenXml 3.0.1 #6

Open echoix opened 9 months ago

echoix commented 9 months ago

As you might know, DocumentFormat.OpenXml version 3 introduced some breaking changes, that make the generated code here not compile at all.

Some hints are here: https://learn.microsoft.com/en-us/office/open-xml/migration/migrate-v2-to-v3

First, there is one compile error of the library that needs to be fixed, for a call to Close(). (In the Docx2source gui, https://github.com/rmboggs/DocxToSource/blob/36399ce7b84cb141fb62e4baf393afd0c3c73901/src/DocxToSource/MainWindowModel.cs#L424). It just needs to be removed, like mentioned here: https://learn.microsoft.com/en-us/office/open-xml/migration/migrate-v2-to-v3#openxmlpackageclose-has-been-removed

One of the changes is that the EnumValues are now structs, and don't contain the enum's values that can be found to match the serialized name to the Enum's value. https://learn.microsoft.com/en-us/office/open-xml/migration/migrate-v2-to-v3#enumvaluetenum-now-contains-structs That's tricky a bit, since the generated code for the TitleOfParts of a doc's properties can't create the correct line for setting the BaseType of a member of VectorBaseValues.

Before (using DocumentFormat.OpenXml <3.0.0, like 2.20.0), whereas a code like this would be generated:

TitlesOfParts titlesOfParts = new TitlesOfParts();

vtVTVector = new VT.VTVector();
vtVTVector.Size = 3u;
vtVTVector.BaseType = VT.VectorBaseValues.Lpstr;

Now with DocumentFormat.OpenXml 3.0.0 or 3.0.1 would generate:

TitlesOfParts titlesOfParts = new TitlesOfParts();

vtVTVector = new VT.VTVector();
vtVTVector.Size = 3u;
vtVTVector.BaseType = VT.VectorBaseValues.VectorBaseValues { };

Which obviously doesn't compile. The line of code where this takes place is https://github.com/Docx2Src/Serialize.OpenXml.CodeGen/blob/476fcfb780381d62800719acc1b28af6672ebe67/src/Serialize.OpenXml.CodeGen/OpenXmlElementExtensions.cs#L620-L633

The pi.GetValue(val) is already incorrect, and .ToString() shows the same thing as in the debugger.

One way to have a working code, but might rely on how it is implemented currently, is to do something like this:

TitlesOfParts titlesOfParts = new TitlesOfParts();

vtVTVector = new VT.VTVector();
vtVTVector.Size = 3u;
vtVTVector.BaseType = new VT.VectorBaseValues("lpstr");

It is how the file works for now, like for VectorBaseValues in https://github.com/dotnet/Open-XML-SDK/pull/1397/files#diff-879432ee46de6485849364258bf13c6b750d84856f64ecb306f7a9935a274fd9R2166-R2211 On their main branch, it is: https://github.com/dotnet/Open-XML-SDK/blob/b217b4aeba3ecf2a2d61d3534ee05ce109c2b5dc/generated/DocumentFormat.OpenXml/DocumentFormat.OpenXml.Generator/DocumentFormat.OpenXml.Generator.OpenXmlGenerator/schemas_openxmlformats_org_officeDocument_2006_docPropsVTypes.g.cs#L2163-L2308

https://github.com/dotnet/Open-XML-SDK/pull/1397/files#diff-8b1abd087cee27e8443e20c915de880a9a5aff67e7df9f096bad5bebb14290d0L100

So we see that there's no mapping from the (xml) serialized value to the clean, beautiful enum value. The the serialized string is the one that is able to be retrieved when we have an instance of VectorBaseValues.Lpstr created. So, if you'd want to keep the real enum values, you could try with reflection or something like that to filter out what the available enum names that can be used, create an instance of each, retrieve their serialized string values, and reverse that lookup to he able to get the correct name. A bit like the file that was deleted here (src/DocumentFormat.OpenXml.Framework/SimpleTypes/EnumInfoLookup.cs): https://github.com/dotnet/Open-XML-SDK/pull/1397/files#diff-8b1abd087cee27e8443e20c915de880a9a5aff67e7df9f096bad5bebb14290d0

For now, I quickly patched it for me with using strings since I didn't understand how to use the Correct CodeDom calls:

                        statement = new CodeAssignStatement(
                            new CodePropertyReferenceExpression(
                                new CodeVariableReferenceExpression(elementName),
                                cp.Name
                            ),
                            new CodeObjectCreateExpression(
                                simpleName,
                                new CodePrimitiveExpression(val.ToString())
                            )
                        );

Ideally, you could create a new object with the properties directly created in the same braces, like the old Productivity Tool 2.5 made:

           Ap.HeadingPairs headingPairs1 = new Ap.HeadingPairs();

            Vt.VTVector vTVector1 = new Vt.VTVector(){ BaseType = Vt.VectorBaseValues.Variant, Size = (UInt32Value)2U };

Second issue is with the use of the static CreateOpenXmlUnknownElement. Like mentionned for version 2.20.0 docs here, it was already deprecated: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.openxmlunknownelement.createopenxmlunknownelement?view=openxml-2.20.0 The CreateUnknownElement method needs to be called on a part container instead. Since it needs to know what container to call it from, I didn't know CodeDom enough to find the right solution. The method to use is here: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.openxmlunknownelementextensions.createunknownelement?view=openxml-3.0.1 An example of another existing issue on GitHub with an example of the change in context: https://github.com/radzenhq/radzen-blazor-studio/issues/102#issuecomment-1524801655

Also, there is a slight mismatch of the Docx2source repo, where this one has upgraded the packages, but the other didn't, so it wasn't possible to compile directly since one repo required v5.0.0 of packages, but the other one required v8.0.0 and thhis repo is referenced as a project reference instead of packagereference, that would be published through nuget. That other repo worked well with .net 8, if the following was changed in two lines, since it complained that the type couldn't be asigned to (something with a throw inside the type's name). https://github.com/rmboggs/DocxToSource/blob/36399ce7b84cb141fb62e4baf393afd0c3c73901/src/DocxToSource/Controls/OpenXmlPartTreeViewItem.cs#L59-L64 Just using the dotnet 6 ArgumentNullException.ThrowIfNull(p), it works well. Also, you might be interested in the other ones added in dotnet 8/C# 12, and the performance difference : https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/#exceptions A couple checks can be easily replaced throughout the code base, and switching to the breaking v3 version of the SDK is maybe appropriate to use dotnet 8 all the way.

ngbrown commented 1 month ago

@echoix do you have a branch or repository with these fixes applied?

echoix commented 1 month ago

I'll take a look (not right away). It's been a while, but maybe it was just fixing the outputted code. I ended up working with the full openxml sdk, learning how to build a document by parts and all.

echoix commented 1 month ago

If I have something it isn't pushed yet it seems.