caxy / php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
http://php-htmldiff.caxy.com
GNU General Public License v2.0
202 stars 51 forks source link

Warning: DOMDocument::loadHTML(): Unexpected end tag : u in Entity #118

Open tkoop opened 1 year ago

tkoop commented 1 year ago

We sometimes get this "Unexpected end tag" problem, and this is how to reproduce it. The following PHP file is very sensitive to spaces, so make sure each and every space is copied correct.y The above warning seems to go away when we use the "keep new lines" config option and remove all the spaces.

<html>

<p>This code fails.  To get it working, remove one space before the "ol" tag on line 31, which is just under "...Something here..." in $newHtml</p>

<?php

$oldHtml = '<ol>
        <li><u>Publication:</u>
          <ol>
            <li>This sentence.</li>
          </ol>
        </li>
        <li><u>Something here</u>:
          <ol>
            <li>Another item</li>
          </ol>
        </li>
      </ol>
      <ol>
        <li><u>Mars</u>:</li>
        <li>Saturn</li>
      </ol>';

      $newHtml = '<ol>
    <li><u>Publication:</u>
     <ol>
     <li>This sentence.</li>
     </ol>
     </li>
     <li><u>Something here</u>:
      <ol>
      <li>Another item</li>
      </ol>
      </li>
      <li><u>Mars</u>:
      <ol>
      <li>Saturn</li>
      </ol>
      </li>
    </ol>';

error_reporting(E_ALL);
ini_set('display_errors', '1');

require __DIR__ . '/../vendor/autoload.php';

use Caxy\HtmlDiff\HtmlDiff;
use Caxy\HtmlDiff\HtmlDiffConfig;

$config = new HtmlDiffConfig();
$config->setKeepNewLines(true);

$htmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config);
$content = $htmlDiff->build();

echo "Diff is " . $content;

?>

</html>
MykhailoSukovitsyn commented 11 months ago

We are having same problem: DOMDocument::loadHTML(): Tag mark invalid in Entity. I found that this happens because Caxy\HtmlDiff\ListDiffLines::listByLines() method uses DOMDocument::loadHTML() and as far as I know libxml 2.6+ works wrong with HTML5 tags. Actually this is very spread issue.

I think XML errors suppressing could be used there using libxml_use_internal_errors(true); and libxml_use_internal_errors(false); after loadHTML() was done.

I will try to investigate this issue deeper and write a PR but it seems that no one are working on this repo. So there is almost no chance that my corrections will be accepted.

jschroed91 commented 11 months ago

@MykhailoSukovitsyn If you get a PR open for this, we will review and merge