OfficeDev / Open-Xml-PowerTools

MIT License
691 stars 26 forks source link

compare two word documents and generating the changes not working #108

Closed manikantad closed 5 years ago

manikantad commented 7 years ago

while comparing document facing issue . i got 2 issues for different files for document 1.file unsupported :document contains text boxes. for document 2.: file contains correpted data.

can you please help me to get a solution for this. img-0 2

iivel commented 7 years ago

I ran into this on a project I'm converting from Interop as well. Shut that down in its tracks. You'll note that it's specified as unsupported "for now" in the code.

    // prohibit
    // - altChunk
    // - subDoc
    // - contentPart
    // - text boxes and nested tables (for now)
    private static void TestForInvalidContent(WordprocessingDocument wDoc)

FWIW: Looking through the original fork, it seems that the update on the 13th reverted that feature:

https://github.com/NavaneethaDev/Open-Xml-PowerTools/commits/2c4b0b3a9c78ad6164de28a7e2a938056b9e7fd5/OpenXmlPowerTools/WmlComparer.cs

Not sure why. I'm did try out the the revision here:

https://github.com/NavaneethaDev/Open-Xml-PowerTools/tree/f8dc62a49bfe980ceedcbacae1cbe5755696d87c

And though the comment history suggests it should work, and there's a LOT of logic around textboxes, I still encounter an unhandled exception from this method

    private static void CreateComparisonUnitAtomListRecurse(OpenXmlPart part, XElement element, List<ComparisonUnitAtom> comparisonUnitAtomList)

Any inputs on when this might get merged into the master?

EricWhiteDev commented 7 years ago

@manikantad @iivel

Hi Guys, fixing this is (most probably) in the near future. The algorithm that the module takes is one I devised after seeing the intrinsic problems with the stock LCS algorithm for content that contains both paragraphs and tables/rows/cells. (Embedded text boxes is a degenerate, simpler case of block-level content that can contain block-level content.) Having this module be perfect is a high priority for me.

I made a mistake when devising that algorithm, such that it wasn't possible to 'reconstitute' the original document from the internal flat referential model if the document contained nested block-level content controls. (It was possible sometimes, but there are edge cases where it doesn't work.) I ran out of time on that project, and further, it was not necessary or required for that project, so the project was stopped at that point. I need to re-work the algorithm one more time (this will be #4).

My schedule will firm up soon. I am hopeful that this will be next on my schedule, but there are no guarantees. :-)

Best, Eric

iivel commented 7 years ago

@EricWhiteDev

Thanks Eric. This is an awesome project and I certainly appreciate your efforts on it. FWIW, my approach is based on trying to replicate the output of the interop-based application I need to replace.

http://levii.com/47/merging-more-than-2-ms-word-documents-what-a-pain

To handle that, after the WmlComparer does it's thing I'm really only using the GetRevisions method. That is collecting a near exact match of my expected output once I force the method in the commit above to return instead of throw an exception ... so I'm pretty confident that your algorithm is getting really really close & can be patient. For my needs, I am rediculously close to making this workable but am stuck at one small part if you have a moment to provide any suggestions or thoughts.

I've attached the modified example class for the comparer and copious notes of where I'm stuck (the right way to handle the foreach at line 81).

Thanks again either way, this project is a wonderful resource!

WmlComparer01.txt

v/r //Levii

tomjebo commented 5 years ago

Closing all issues as this repo is being archived and will no longer be maintained by Microsoft. The project is licensed for continued use and development by forking to your own repo.