Rohland / htmldiff.net

Html Diff algorithm for .NET
MIT License
288 stars 83 forks source link

Break the html #36

Open jigskpatel opened 5 years ago

jigskpatel commented 5 years ago

Hi,

At first, thank you so much for this great tool and it works well in most html scenarios, but sometimes it broke the html.

We are using this tool for comparing two html string, but sometimes the differences make for an abnormal view of the HTML string.

I have attached three html files in zip folder. Compare1.html & Compare2.html are using for comparing the html and Result.html which I got after compared two html strings.

files.zip

BrokenHtml

We are using following C# code:

var doc = new HtmlDocument(); //create object from HtmlAgilityPack doc.Load(@"Compare1.html"); string Html1 = doc.ParsedText.ToString(); doc.Load(@"Compare2.html"); string Html2 = doc.ParsedText.ToString(); HtmlDiff.HtmlDiff HtmlDiff = new HtmlDiff.HtmlDiff(Html1, Html2); string strHtmlDiff = HtmlDiff.Build(); // get broken html string

Please suggest me.

Thanks, Jignesh

bobcarboni commented 4 years ago

Hello, we are also seeing invalid HTML generated from the diff.

Old Text: <p>Some Text</p> New Text: <ul><li>Some Text with added stuff</li></ul> Diff Text: <p><ul><li>Some Text</p><ins class='diffmod'>&nbsp;with added stuff</ins></li></ul>

The paragraph ending tag is embedded in the <li> tag. Browsers like Chrome will fixup the bad HTML and cause line breaks and other bad formatting.

This tool is otherwise very nice and easy to use, thanks.