dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
3.99k stars 544 forks source link

Deletion of a Part breaks reading other Parts through "using" method #1729

Open Asbjoedt opened 3 months ago

Asbjoedt commented 3 months ago

*Describe the bug Hello I am upgrading from v2.20 to v.3.0.2.

The new SDK update breaks code which involves deleting/removing some Open XML Parts in my spreadsheet document.

I am applying "using" to read and write the spreadsheet document. I receive error when trying to read any Part after deletion of some other Parts, when I try to read the Part in a new "using" scope. I receive error "Specified part does not exist in the package".

Opening the spreadsheet document wit Excel or LibreOffice works perfectly fine.

Observed behavior Open XML SDK perceives the spreadsheet document to be broken with error "Specified part does not exist in the package" whenever I try to read any Part, if some Parts have previously been deleted/removed with the "using" method. Excel renders the spreadsheet document without errors.

Expected behavior

Desktop (please complete the following information):

mikeebowen commented 3 months ago

Hi @Asbjoedt, could you be more specific about which parts you mean:

I am applying "using" to read and write the spreadsheet document. I receive error when trying to read any Part after deletion of some other Parts, when I try to read the Part in a new "using" scope. I receive error "Specified part does not exist in the package".

When you day "read any Part after deletion of some other Parts", do you mean when you read any Part at all even the WorkbookPart? And which parts do you mean by "deletion of some other Parts"?

Asbjoedt commented 3 months ago

When I delete these parts, I get "Specified part does not exist in the package", whenever I try to read any Part again:

twsouthwick commented 3 months ago

please supply a repro so that we can try it out

Asbjoedt commented 3 months ago

Sample spreadsheet with a data connection With data connection.xlsx

Use this code to remove a data connection

// Remove data connections
public int Remove_DataConnections(string filepath)
{
    int success = 0;

    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
    {
        ConnectionsPart conn = spreadsheet.WorkbookPart.ConnectionsPart;

        // Count connections
        success = conn.Connections.Count();

        // Delete all connections
        spreadsheet.WorkbookPart.DeletePart(conn);

        // Delete all QueryTableParts
        IEnumerable<WorksheetPart> worksheetParts = spreadsheet.WorkbookPart.WorksheetParts;
        foreach (WorksheetPart worksheetPart in worksheetParts)
        {
            // Delete all QueryTableParts in WorksheetParts
            List<QueryTablePart> queryTables = worksheetPart.QueryTableParts.ToList(); // Must be a list
            foreach (QueryTablePart queryTablePart in queryTables)
            {
                worksheetPart.DeletePart(queryTablePart);
            }

            // Delete all QueryTableParts, if they are not registered in a WorksheetPart
            List<TableDefinitionPart> tableDefinitionParts = worksheetPart.TableDefinitionParts.ToList();
            foreach (TableDefinitionPart tableDefinitionPart in tableDefinitionParts)
            {
                List<IdPartPair> idPartPairs = tableDefinitionPart.Parts.ToList();
                foreach (IdPartPair idPartPair in idPartPairs)
                {
                    if (idPartPair.OpenXmlPart.ToString() == "DocumentFormat.OpenXml.Packaging.QueryTablePart")
                    {
                        // Delete QueryTablePart
                        tableDefinitionPart.DeletePart(idPartPair.OpenXmlPart);
                        // The TableDefinitionPart must also be deleted
                        worksheetPart.DeletePart(tableDefinitionPart);
                        // And the reference to the TableDefinitionPart in the WorksheetPart must be deleted
                        List<TablePart> tableParts = worksheetPart.Worksheet.Descendants<TablePart>().ToList();
                        foreach (TablePart tablePart in tableParts)
                        {
                            if (idPartPair.RelationshipId == tablePart.Id)
                                tablePart.Remove();
                        }
                    }
                }
            }
        }

        // If spreadsheet contains a CustomXmlMappingsPart, delete databinding
        if (spreadsheet.WorkbookPart.CustomXmlMappingsPart != null)
        {
            CustomXmlMappingsPart xmlMap = spreadsheet.WorkbookPart.CustomXmlMappingsPart;
            List<Map> maps = xmlMap.MapInfo.Elements<Map>().ToList(); // Must be a list
            foreach (Map map in maps)
            {
                if (map.DataBinding != null)
                    map.DataBinding.Remove();
            }
        }
    }
    return success;
}

Then immediately use this code to read all hyperlink relationships of the same spreadsheet. It should fail with error System.InvalidOperationException: 'Specified part does not exist in the package.'

// Extract all cell hyperlinks to an external file
public int Extract_Hyperlinks(string filepath)
{
    int hyperlinks_count = 0;

    // Read spreadsheet
    using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, false))
    {
        // Find all hyperlinks
        List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();

        // Create metadata file
        string folder = System.IO.Path.GetDirectoryName(filepath);
        using (StreamWriter w = File.AppendText($"{folder}\\orgFile_Metadata.txt"))
        {
            w.WriteLine("---");
            w.WriteLine("EXTRACTED HYPERLINKS");
            w.WriteLine("---");

            foreach (HyperlinkRelationship hyperlink in hyperlinks)
            {
                // Write information to metadata file
                w.WriteLine(hyperlink.Uri);
                // Add to count
                hyperlinks_count++;
            }
        }
    }
    return hyperlinks_count;
}

The exception will be thrown at:

// Find all hyperlinks
List<HyperlinkRelationship> hyperlinks = spreadsheet.GetAllParts().SelectMany(p => p.HyperlinkRelationships).ToList();
Asbjoedt commented 2 months ago

Hi @mikeebowen, @twsouthwick Have you been able to look any further at the issue? Can I help to progress issue resolution?