dotnet / Open-XML-SDK

Open XML SDK by Microsoft
https://www.nuget.org/packages/DocumentFormat.OpenXml/
MIT License
4.03k stars 547 forks source link

Bug with Open XML SDK #1338

Open kkurhekar10 opened 1 year ago

kkurhekar10 commented 1 year ago

We are doing some String manipulation in our code using OPEN XML and Facing issue. Text In Input File : “Figure 4. Word and other agencies like IRCC look this for strategy and technical solutions. “ Text in output File after manipulation: “Figure 4. Word and other agencies l<> IRCC look this for strategy and technical solutions.” Here IRCC should be replace with <>, but “ike” is getting replaced instead.

Figure 4 is a link here to some figure. When read in Open XML , it is read as REF _Ref12123123

ThomasBarnekow commented 1 year ago

@kkurhekar10, please provide the relevant parts of your code, eg., in the form of a unit test, and the Open XML markup (e.g, a Word document) so that we can reproduce the behavior.

kkurhekar10 commented 1 year ago

File.docx Attached is the sample source file, output file and code snippet. static void Main(string[] args)         { File_Output.docx

            string filepath = "C:/Sample/";             string Filename = "File.docx";             string SrcFilename = filepath + Filename;             string DstFilename = @"C:\Sample\" + "File" + "_" + DateTime.Now.ToString("yyyyMMdd_HH_mm_ss") + ".docx";             File.Copy(SrcFilename, DstFilename, true);             if (System.IO.File.Exists(DstFilename))             {                 using (WordprocessingDocument wDoc = WordprocessingDocument.Open(DstFilename, true))                 {                     if (wDoc != null)                     {                         XDocument xDoc = wDoc.MainDocumentPart.GetXDocument();                         if (xDoc != null)                         {                             string User = "user1";                             IEnumerable content = xDoc.Descendants(W.p).Take(1);                             string inputText = "IRCC";                             string replacedText = "<>";                                                         Regex regex = new Regex(inputText);                             //content.Remove()                             int count = OpenXmlRegex.Replace(content, regex, replacedText, null, true, User);                             content = xDoc.Descendants(W.p).Take(1);                             wDoc.MainDocumentPart.PutXDocument();                             wDoc.Close();                             wDoc.Dispose();                         }                     }                 }                 byte[] byteArray = File.ReadAllBytes(DstFilename);             }         }

ThomasBarnekow commented 1 year ago

Couple thoughts and questions:

ThomasBarnekow commented 1 year ago

@kkurhekar10, could you please test your code with a different replacement text that does not contain any characters that are reserved in XML? For example, use "TEST".