Open Yanal-Yves opened 2 years ago
@tomjebo Any thoughts?
@Yanal-Yves did this happen with previous versions of the SDK? I'll test it out and see if I can tell what's causing the change to a macro type workbook.
I just created workbook (.xlsx) using your code. OneDrive doesn't do this conversion to .xlsm. Google Drive does. They must have some security constraint that kicks in. I'll see if I can tell what that is but it may be something as simple as inconsistent attributes or some other factor.
Hi, thank you for your answer. I have juste tested with version 2.13.1 of the SDK and I can observer the same behavior.
I have also tested with version 2.5.0 and I can observer the same behavior too.
I have also tested with version 1.0.0 and I can observer the same behavior too.
Yeah, looks like the workbook generated is fine but definitely that code would not include several parts that Excel does when creating a new, fresh workbook. The parts not included by the SDK genned file are: docprop\app.xml, docprop\core.xml, xl\sharedstrings.xml, xl\styles.xml, and xl\theme\theme1.xml. There are some differences in the workbook.xml, workbook.xml.rels, [content_types].xml and sheet1.xml but they all look reasonable and should not cause anything like a filetype change AFAIK. We might have to ask Google about this. Otherwise, we can just start adding in the other parts and see when Sheets likes the file as .xlsx.
@Yanal-Yves I just added this line:
CoreFilePropertiesPart coreDocProps = spreadsheetDocument.AddCoreFilePropertiesPart();
And it seemed to fix the behavior. Can you try it?
I've just tested with
var filepath = @"C:\temp\foo.xlsx";
// Create a spreadsheet document by supplying the filepath.
// By default, AutoSave = true, Editable = true, and Type = xlsx.
SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook);
spreadsheetDocument.AddCoreFilePropertiesPart();
// Add a WorkbookPart to the document.
WorkbookPart workbookpart = spreadsheetDocument.AddWorkbookPart();
workbookpart.Workbook = new Workbook();
// Add a WorksheetPart to the WorkbookPart.
WorksheetPart worksheetPart = workbookpart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet(new SheetData());
// Add Sheets to the Workbook.
Sheets sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
// Append a new worksheet and associate it with the workbook.
Sheet sheet = new Sheet() { Id = spreadsheetDocument.WorkbookPart.GetIdOfPart(worksheetPart), SheetId = 1, Name = "mySheet" };
sheets.Append(sheet);
workbookpart.Workbook.Save();
// Close the document.
spreadsheetDocument.Close();
And I can confirm I don't have the issue anymore.
Please note that the call to spreadsheetDocument.AddCoreFilePropertiesPart();
should be done early. If done just before workbookpart.Workbook.Save();
, the issue is still there.
@tomjebo, I'd like to thank you for your help.
For information I have submitted an issue to Google : https://issuetracker.google.com/issues/210875597
BTW I have also answered the Stack Overflow question : https://stackoverflow.com/questions/70319867/avoid-google-spreadsheet-to-convert-an-xlsx-file-created-by-open-xml-sdk-to-xlsm/70371638#70371638
@Yanal-Yves thanks for confirming and yes, I added the core part right after creating the document. I think we will consider adding this (and possibly the other standard parts) to the creation process. I'll create a PR for this and we can review and discuss but it looks like it should be straightforward.
Although it makes some sense for us to create workbooks with these common parts, I think this Google Sheets behavior is a little strange. I don't know what the rationale for it is and it should probably be reported to Google.
Hi @tomjebo,
Calling AddCoreFilePropertiesPart()
results in an empty core.xml
file. And Excel throw an error.
The open XML SDK Productivity Tool for Microsoft Office also cannot not open the file generated:
Thanks to this doc , I was able to work around the issue.
spreadsheetDocument.AddCoreFilePropertiesPart();
using (XmlTextWriter writer = new XmlTextWriter(coreFilePropPart.GetStream(FileMode.Create), System.Text.Encoding.UTF8))
{
writer.WriteRaw("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<cp:coreProperties xmlns:cp=\"https://schemas.openxmlformats.org/package/2006/metadata/core-properties\"></cp:coreProperties>");
writer.Flush();
}
IMHO:
core.xml
?AddCoreFilePropertiesPart()
should not produce an inconsistent XLSX state and should create the root element.Interesting, I'll have to test this again. It didn't throw an error for me. Are you using an older version of Excel?
Ok, tested and yes, you're right, it's empty. It satisfied the Google Sheets test but didn't satisfy Excel webapp or desktop. What you're doing might be needed for now because I don't see that we have the root element defined in the SDK. The part is defined in a bare bones manner. I'll add this to our "enhancement" project list which will be to build out "empty" documents for Word, PowerPoint and Excel to mimic what the Office apps do to some better degree.
@Yanal-Yves I created PR #1107 to test this out in the Create function. If it's accepted we can use this for now and add other parts if we think they're needed later.
@Yanal-Yves, there is no bug in the Open XML SDK. What everybody needs to understand is that the Open XML SDK provides a super-low level API for creating Open XML packages and markup. Therefore, by design, developers will have to:
OpenXmlPackage
, i.e., the WordprocessingDocument
, SpreadsheetDocument
, or PresentationDocument
, none of which will be valid at this point;OpenXmlPart
s (e.g., WorkbookPart
, WorksheetPart
, CoreFilePropertiesPart
), i.e.:
OpenXmlPart
s (because they would otherwise be empty and non-compliant in any case).If you omit any of the above steps, it will not work because the SDK has not been used as designed. Thus, looking at your code, the error is not to add the Open XML markup to the CoreFilePropertiesPart
.
I would also recommend leveraging using
statements when creating or opening documents, in which case you don't have to save and close those documents explicitly. You might see that in examples, but that's another issue ;-).
Unfortunately, the Open XML SDK does not include strongly-typed classes for the CoreFilePropertiesPart
. For that reason, there is also no generated code in the DocumentFormat.OpenXml.Linq
project and namespace. We should either add the required schema to generate the code or add the following to the project (with some documentation, of course):
public static class CP
{
public static readonly XNamespace cp =
"http://schemas.openxmlformats.org/package/2006/metadata/core-properties";
public static readonly XName category = cp + "category";
public static readonly XName contentStatus = cp + "contentStatus";
public static readonly XName contentType = cp + "contentType";
public static readonly XName coreProperties = cp + "coreProperties";
public static readonly XName keywords = cp + "keywords";
public static readonly XName lastModifiedBy = cp + "lastModifiedBy";
public static readonly XName lastPrinted = cp + "lastPrinted";
public static readonly XName revision = cp + "revision";
}
Using the Linq-to-XML way, you could then do the following:
var filepath = @"C:\temp\foo.xlsx";
// Create an empty SpreadsheetDocument with no OpenXmlParts.
// With the using statement, the document will be saved automatically once you leave the scope.
using var spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook);
// Add a CoreFilePropertiesPart without any content.
CoreFilePropertiesPart part = spreadsheetDocument.AddCoreFilePropertiesPart();
// Set the part contents, i.e., an empty cp:coreProperties element with no actual properties.
// This specific call will also save the contents to the part.
part.SetXElement(new XElement(CP.coreProperties, new XAttribute(XNamespace.Xmlns + "cp", CP.cp)));
Whether or not the above should be simpler is a different question. For example, you might want to have something like the following:
using var spreadsheetDocument = SpreadsheetDocument.CreateMinimalPackage(filePath);
This would "magically" create a package that fulfills the minimum requirements defined by the standard or is close to what Excel might produce. The same method would have to exist for other subtypes of OpenXmlPackage
.
We could also create a feature that might offer the following:
using var spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook);
// Call an extension method provided by a feature.
spreadsheetDocument.AddMinimumContent();
@ThomasBarnekow,
Thank you for taking time to answer me, for your explanation about the philosophy of this library and for pointing out how to use Linq to XML to add the Open XML markup to the OpenXmlPart.
I am an Open XML SDK newbie and you probably know much more than I do about this SDK but I still feel wired to have methods (that seems high level) that produces inconsistent documents. I feel it's so error prone that even the official documentation does not provide the right code snippet to create a minimal empty document correctly. The code I used in this issue is from that documentation. What's the purpose of having a Create
method that's not self sufficient ? My guess is that every developer that creates a document wants to start with a consistent minimal document. Again I don't know all your use case and I may be wrong.
Thank you for mentioning the missing using
. The code I provided is a copy / past from the official documentation ;-)
Thankfully in my production code I double checked and I've already noticed that SpreadsheetDocument
is implementing IDisposable
and didn't forget the using
keyword.
Sorry to ask but I couldn't find the SetXElement
method in the DocumentFormat.OpenXml
package. I can see the OpenXmlPartRootXElementExtensions
class in the Github but couldn't find that class when I decompile the DocumentFormat.OpenXml
package with JetBrains dotPeek. Did I miss something. What NuGet Package / namespace import should I use to get the SetXElement
extension method ?
@Yanal-Yves
The code I used in this issue is from that documentation. What's the purpose of having a Create method that's not self sufficient ?
Just a reminder that the behavior Google Sheets is showing is not completely understood. Otherwise, this has not been a problem with Microsoft Office (web or desktop) or any other apps AFAIK. I don't see that those parts are required by the standard, looking at OPC (29500-2) or markup standard (29500-1) or our extensions and implementer notes. We'll investigate this but I think this is ultimately an enhancement for now and also we should follow up with Google to find out what the real story is with their Sheets app behavior.
@tomjebo,
OK understood, thank you for pointing that out.
As I mentioned earlier a call to AddCoreFilePropertiesPart
is creating an inconsistent XLSX that causes Excel to ask to repair the file because it creates an emtpy core.xml
. Isn't this a bug ?
I read the code of #1107 and maybe instead of calling AddCoreFilePropertiesPart
in the Create
method, you can add the root element to core.xml
when AddCoreFilePropertiesPart
is called ?
So Create
behavior stays unchanged and AddCoreFilePropertiesPart
stops producing inconsistent documents.
@Yanal-Yves, the SetXElement
method is one of the extension methods provided by the DocumentFormat.OpenXml.Linq
project in this repository. It should have been published as a NuGet package together with DocumentFormat.OpenXml
v2.15.0. Publishing somehow failed due to a permission issue, but it will be published shortly. In the meantime, you can get it through the repo's custom feed as described under Daily Builds on the main page.
To use the SetXElement
and corresponding extension methods, you'd have to add a package reference to the DocumentFormat.OpenXml.Linq
package (version 2.15.0). The methods are provided by a public static class in the DocumentFormat.OpenXml.Packaging
namespace, which also contains the SpreadsheetDocument
class, for example. Thus, if you have the using DocumentFormat.OpenXml.Packaging;
required to use the SpreadsheetDocument
, you are good to go. However, you do need to reference the DocumentFormat.OpenXml.Linq
NuGet package.
As I mentioned earlier a call to
AddCoreFilePropertiesPart
is creating an inconsistent XLSX that causes Excel to ask to repair the file because it creates an emtpycore.xml
. Isn't this a bug ?
No, it is not a bug. It's a feature. Calling methods like AddWorkbookPart
, AddWorksheetPart
, or AddCoreFilePropertiesPart
always just adds an empty part (think XML file for now) to a package, i.e., a ZIP file having a well-defined structure (think folders, files, and metadata). Adding or changing the file contents is a separate step. Just creating a part is similar to creating a FileStream
and then not writing any data to it.
So Create behavior stays unchanged and AddCoreFilePropertiesPart stops producing inconsistent documents.
See above. Changing the AddCoreFilePropertiesPart
method would have far-reaching consequences. Firstly, we would have to make sure it does not constitute a breaking behavioral change. Secondly, we would have to change that behavior for all other methods like it (and there are a lot of those across all package types).
Using the Open XML PowerTools with a coming update to the Linq-to-XML feature as an example, you might see the following approach:
using DocumentFormat.OpenXml.Linq;
using DocumentFormat.OpenXml.Packaging;
using var wordDocument = WordprocessingDocument.Create(...);
// Add an empty part with no content.
MainDocumentPart part = wordDocument.AddMainDocumentPart();
// Get the reference to the currently empty XDocument representation of the part.
XDocument partXDocument = part.GetXDocument();
// Add (!) the root XElement, i.e., a w:document element, to the XDocument.
partXDocument.Add(
new XElement(W.document,
new XAttribute(XNamespace.Xmlns + "w", W.w),
new XElement(W.body,
new XElement(W.p))));
The above is one approach that is used in the Open XML PowerTools and likely mimicked by many users. This would break if the AddMainDopcumentPart
suddenly created a part with content rather than an empty part.
There are certainly other approaches to initializing the contents of a part that would not break but simply overwrite the existing content.
However, we are coming from the incorrect use of the admittedly low-level Open XML SDK. That incorrect use might have been informed by an incorrect example. So let's use the Open XML SDK as designed and consider correcting the incorrect examples.
Let's also understand that Open XML is extremely large and complex. The foundation document of the standard alone has north of 5,000 pages. To properly use the low-level Open XML SDK, you need to understand Open XML as well. Otherwise, you will not understand which parts and elements to use. Once you understand Open XML, the SDK will make sense to you. So welcome to a very interesting journey.
There were some additional comments on PR #1107 about how to resolve this. We will close PR #1107 so I'm copying the substantive comments here. Thanks @ThomasBarnekow for these comments:
ThomasBarnekow:
@tomjebo, having looked at your project description, I wanted to comment on your points here (as I did not see a way to do that in the project).
Won't add some of the basic parts that some apps like Google Sheets expect. core.xml, app.xml, etc...
This is covered by my comment above regarding the "minimal" documents. It would be a very good idea to add this to the SDK. While other libraries have filled at least some of the gaps, I believe this is so basic that it should be done by the SDK.
We don't define the root element for CoreFilePropertiesPart and that means the programmer has to add the contents of these parts (at least core.xml) with raw XML like this: [example not shown]
This is true for all parts. In all cases, you have to set that root element explicitly after having added a new part. Using a WordprocessingDocument and a MainDocumentPart as an example, you'll have to do the following to create a minimal document when using the strongly typed classes:
using var wordDocument = WordprocessingDocument.Create(stream, type); MainDocumentPart part = wordDocument.AddMainDocumentPart(); part.Document = new Document( new Body( new Paragraph())); I am not 100% sure whether the w:p element is required by the standard, but Word would add it. I've laid out the code to mimic the layout of the corresponding XML document.
Using the Linq-to-XML way, you'd do it like this.
using var wordDocument = WordprocessingDocument.Create(stream, type); MainDocumentPart part = wordDocument.AddMainDocumentPart(); part.SetXElement( new XElement(W.document, new XAttribute(XNamespace.Xmlns + "w", W.w), new XElement(W.body, new XElement(W.p))) It would be really cool if the SDK could create standards-compliant, minimal parts (e.g., MainDocumentPart, CoreFilePropertiesPart) when calling the AddMainDocumentPart, for example. That way, the Document and RootElement properties would not have to be marked as nullable.
We should investigate building these documents more completely like Office, even though Office doesn't throw an error on opening them.
Yes, absolutely!
Reference: PR 1107 issuecomment-996561981
ThomasBarnekow: It would certainly be nice to be able to tell the SDK to create "more complete" or "initially compliant" documents such as minimal Word, Excel, and PowerPoint documents that fulfill certain criteria. But those should have to be created both explicitly (e.g., through settings or otherwise) and consistently. "Consistently" should also mean "for all Office document types", i.e., Excel, Word, and PowerPoint. The latter would particularly benefit from a feature that creates a minimal PresentationDocument as the minimal markup required is much more involved than that of WordprocessingDocument and SpreadsheetDocument, for example.
Whatever magic we add should not create breaking changes. When creating documents with the SDK, no parts are added at the moment. Thus, callers would happily (try to) create another part of the same type, which will likely fail (I did not test that).
This method (or its siblings) would not be the place to inject such functionality. You'd have to duplicate this in all other Create methods. We'd need some other place, such as the constructor(s).
Reference: PR 1107 discussion_r771194825
Description First I would like to thank you for this library.
I am posting here a wired behavior when generating an Excel file with Open XML SDK and uploading the file to Google spreadsheet. I believe this is an oddity caused by a bug in Open XML SDK. This is why I am posting here. I don't think this is a question about using Open XML SDK. But in case I'am wrong, I have also posted to Stack overflow :
So I am using the Open XML SDK 2.14.0 to create Excel files. If I upload the XLSX file produced by Open XML SDK to Google Spreadsheet, it opens the file as an XLSX file but if I make any change in the spreadsheet then Google Spreadsheet changes the file type to XLSM. If I generate an empty workbook from MS Excel (Office 360) or Libre Office Calc I don't have this issue : Google Spreadsheet will not change the file type if I change the data.
Information
Repro
The minimalistic code bellow (found in the official documentation) reproduces the issue. It generate a
foo.xlsx
file inC:\Temp
.Observed
If I upload the generated file to Google Spreadsheet, it shows this is an XLSX file.
But if I change anything in the workbook then Google Spreadsheet changes the file type to XLSM.
Expected
Open XML SDK generates the correct XLSX file so that Google spreadsheet does not change to XLSM when saving the first time.