caxy / php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
http://php-htmldiff.caxy.com
GNU General Public License v2.0
202 stars 51 forks source link

Lists being marked as changed when the output is 100% the same. #100

Open Ambient-Impact opened 3 years ago

Ambient-Impact commented 3 years ago

I've got the following two blocks of HTML being marked as changed, even though I've diffed them with WinMerge and it tells me they're 100% identical.

Block 1:

<h3>References<a id="References" href="#References" name="References" class="heading-permalink ambientimpact-link-has-image" aria-hidden="true" title="Permalink"><span class="ambientimpact-icon ambientimpact-icon--name-link ambientimpact-icon--bundle-libricons ambientimpact-icon--text-hidden ambientimpact-icon--icon-standalone ambientimpact-icon--is-bundle-loaded ambientimpact-icon--icon-standalone-loaded"><svg class="ambientimpact-icon__icon" viewBox="0 0 24 24" width="24" height="24" aria-hidden="true"><use xlink:href="/modules/ambientimpact/ambientimpact_icon/icons/libricons.svg?qq6uy7#icon-link"></use></svg><span class="ambientimpact-icon__text"><span class="ambientimpact-link-has-image__text">Permalink</span></span></span></a></h3>
<div class="references" role="doc-endnotes"><ol><li class="references__list-item" id="reference-conv" role="doc-endnote"><p>Kierney, L. (May 2029). “Bigger Fish To Fry: An Interview With William Lassgard.” <em>forbes.com</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-conv" role="doc-backlink">↩</a></p></li>
<li class="references__list-item" id="reference-dott" role="doc-endnote"><p>Bridges, C. (August 2012). “Translation of domestication of Thunnus thynnus into an innovative commercial application.” <em>transdott.eu</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-dott" role="doc-backlink">↩</a></p></li>
<li class="references__list-item" id="reference-12" role="doc-endnote"><p>Åkesson, N. (October 2039). “Leaked correspondence between Xu Shaoyong and William Lassgard paints dramatic picture.” <em>Dagens Nyheter</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-12" role="doc-backlink">↩</a></p></li></ol></div></div>

Block 2:

<h3>References<a id="References" href="#References" name="References" class="heading-permalink ambientimpact-link-has-image" aria-hidden="true" title="Permalink"><span class="ambientimpact-icon ambientimpact-icon--name-link ambientimpact-icon--bundle-libricons ambientimpact-icon--text-hidden ambientimpact-icon--icon-standalone ambientimpact-icon--is-bundle-loaded ambientimpact-icon--icon-standalone-loaded"><svg class="ambientimpact-icon__icon" viewBox="0 0 24 24" width="24" height="24" aria-hidden="true"><use xlink:href="/modules/ambientimpact/ambientimpact_icon/icons/libricons.svg?qq6uy7#icon-link"></use></svg><span class="ambientimpact-icon__text"><span class="ambientimpact-link-has-image__text">Permalink</span></span></span></a></h3>
<div class="references" role="doc-endnotes"><ol><li class="references__list-item" id="reference-conv" role="doc-endnote"><p>Kierney, L. (May 2029). “Bigger Fish To Fry: An Interview With William Lassgard.” <em>forbes.com</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-conv" role="doc-backlink">↩</a></p></li>
<li class="references__list-item" id="reference-dott" role="doc-endnote"><p>Bridges, C. (August 2012). “Translation of domestication of Thunnus thynnus into an innovative commercial application.” <em>transdott.eu</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-dott" role="doc-backlink">↩</a></p></li>
<li class="references__list-item" id="reference-12" role="doc-endnote"><p>Åkesson, N. (October 2039). “Leaked correspondence between Xu Shaoyong and William Lassgard paints dramatic picture.” <em>Dagens Nyheter</em>.&nbsp;<a class="references__backreference-link" rev="footnote" href="#backreference-12" role="doc-backlink">↩</a></p></li></ol></div></div>

Could it be due the emoji or the <p> elements? Not sure if the <p> elements are valid nesting, so I'll likely try to remove those, but they're being automatically generated by CommonMark or a Drupal filter.

Ambient-Impact commented 3 years ago

I think I've figured this out! Turns out that it was indeed the <p> elements that were causing this; removing them seems to have restored the expected behaviour. I also tried adding 'p' to Caxy\HtmlDiff\ListDiffLines::listContentTags which also fixed it, so perhaps this element could be added to that array? According to the MDN entry for <li>, <p> elements and a few others that count as "flow content" are allowed in list items.

@SavageTiger Thoughts?

jschroed91 commented 11 months ago

@Ambient-Impact This was a long time ago, I know, but I'm curious if the

tags were getting removed by the HTML sanitizer / purifier that runs I wonder... as you mentioned in theory

tags should be fine within

  • tags. I feel like adding it to listContentTags would make sense, I'm curious why it wasn't originally 🤔

  • Ambient-Impact commented 11 months ago

    This does feel like a lifetime ago. 😂

    I don't know enough about what's handled by this library and what's handled by the purifier, but I'm guessing the oversight is probably because it's not a common thing to want to put <p> inside a <li> intentionally - I don't think I knew it was valid until I looked it up.