dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.03k stars 547 forks source link

How to compare two word documents to show diff in a new document #939

Closed weedkiller closed 3 years ago

weedkiller commented 3 years ago

Description

Use case example: We have multiple documents, constantly revised in parallel by several people for pricing quotes, and we want to compare what changed in those documents. It possible and fast to do 2 documents with a 3rd party like aspose, but thats the limit, and it involves buying a license.

I tried it with the text parts and then compare each piece, its very slow, crashes and unreliable on occasions does not show the differences. Is there a better way hierarchical walk-down compare function call from the top sections to each piece of text..?

List<Text> compareTextparts = document.MainDocumentPart.Document.Body.Descendants<DocumentFormat.OpenXml.Wordprocessing.Text>().ToList();

  1. Can you add/show a sample to do a two way or three way compare of the wholistic documents, is it a recursivehierarchical walk-down compare function call how to prevent the crash/watch memory leaks. I tried it with using()
  2. Present the output back to the user as a new document to accepted in ASP Core.
ashahabov commented 3 years ago

What about WmlComparer from Open XML SDK Power Tools. It uses Open XML SDK and can generate document with revisions based on origin and modified .docx files.

ThomasBarnekow commented 3 years ago

As noted by @adamshakhabov, the WmlComparer might do the job for you unless you require a 100% faithful representation, e.g., including fields and content controls, which will not be represented in the comparison document.

Rolling your own document comparer would involve significant effort. Should you want to go down that route, look for the LCS algorithm and the paper "An O(ND) Difference Algorithm and its Variations" by Eugene Myers as a starting point. However, the recursive nature of Open XML and its rich features would definitely make it a challenging endeavor.

weedkiller commented 3 years ago

thanks @adamshakhabov and @ThomasBarnekow WML compare might work , its taking to me to a fork test case.

I dont believe my use case is all that complex, I have a simple quote thats emailed to the customer and he fills/edits some fields or text and I want to be able to highlight that, I need a sample.

ThomasBarnekow commented 3 years ago

In your use case, the WmlComparer should indeed be totally sufficient.