dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.01k stars 545 forks source link

SAX Approach Replace Example #566

Open AlexHedley opened 5 years ago

AlexHedley commented 5 years ago

I'm looking for an example of using the SAX approach (instead of DOM) to open a large file and perform a replace on a given value, then save the file.

I think this would be a useful addition to the Documentation.

How to: Search and replace text in a document part (Open XML SDK) https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

This uses a Stream.

How to: Parse and read a large spreadsheet document (Open XML SDK) https://docs.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet

// The SAX approach.
static void ReadExcelFileSAX(string fileName)
{
    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

        OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
        string text;
        while (reader.Read())
        {
            if (reader.ElementType == typeof(CellValue))
            {
                text = reader.GetText();
                Console.Write(text + " ");
            }
        }
        Console.WriteLine();
        Console.ReadKey();
    }
}

If I read in the "text" and wish to replace this:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
    Document document = wordDoc.MainDocumentPart.Document;
    OpenXmlReader reader = OpenXmlReader.Create(document);
    while (reader.Read())
    {
        //OpenXmlElement element = reader.LoadCurrentElement();
        //text = element.InnerText;
        text = reader.GetText();
    }
}

OpenXmlWriter expects an OpenXmlPart or Stream when you create it.

OpenXmlWriter writer = OpenXmlWriter.Create(#);
writer.WriteStartElement(reader);
writer.WriteElement(#);
writer.WriteEndElement();
writer.Close();

What is the supported approach/method for this?

Using the DOM approach on large files can cause memory exceptions.


There are a number of blog posts documenting how to use the OpenXmlWriter but this is for creating new files or adding new elements to an existing file, not updating existing data.

Parsing and Reading Large Excel Files with the Open XML SDK http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link] https://web.archive.org/web/20151205145806/http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx

Writing Large Excel Files with the Open XML SDK http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link] https://web.archive.org/web/20160216062257/http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx

Performance issue while reading/writing large excel files using OpenXML SDK http://tech-turf.blogspot.com/2015/10/performance-issue-while-readingwriting.html

How to read and write Excel cells with OpenXML and C# http://fczaja.blogspot.com/2013/05/how-to-read-and-write-excel-cells-with.html

How to properly use OpenXmlWriter to write large Excel files http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

github-actions[bot] commented 4 years ago

Stale issue message

AlexHedley commented 4 years ago

Hi @twsouthwick I’ve see you comment on other issues for this repo, would it be possible for a comment on this?

Also is there a reason why this would be auto closed with no interactions?

I know other questions have been suggested to moved to Stack Overflow, could instead the new Discussions tab be added and it moved to there instead, if you don’t think this is an Issue,

Thanks

twsouthwick commented 4 years ago

Sorry for the auto close. I enabled a bot that went through and closed issues that had no comments on it. Happy to reopen and take a look.

AlexHedley commented 4 years ago

@twsouthwick if you have some time that would be great, thanks.

sorensenmatias commented 4 years ago

+1 for this - I am currently faced with a presentation containing lots of vector graphics that ends up allocating 15.000.000 objects in memory using the DOM approach. This basically makes our product unusable. I would also love to hear about workaround to avoid loading everything to memory before it is possible to do manipulation on elements.

twsouthwick commented 1 year ago

I'm going to close this in favor of #1193 since that has a proposal for this approach. Please reopen if you think it's different.

AlexHedley commented 1 year ago

@twsouthwick I'm not sure how reading the last element would allow for replacing an item anywhere in the document, unless I'm missing something?

twsouthwick commented 1 year ago

It may not solve your need - the title had SAX in it and was trying to consolidate efforts and issues. Sounds like different scenarios at play

rsdelapaz commented 1 year ago

Finally, as I couldn't find a workaround for this bug, I gave a chance to the LargeXlsx library (https://github.com/salvois/LargeXlsx) and after running several tests in a development environment, I deployed the application to a production environment where I always received the OutOfMemory exception on Azure, and since then it has not occurred again despite generating Excel files with hundreds of thousands of records. The library is easy to use and the documentation is good.

In summary, due to the need to solve this problem, I had to look for alternatives, and fortunately, they were successful. If you have any questions, do not hesitate to ask. Regards.