caxy / php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
http://php-htmldiff.caxy.com
GNU General Public License v2.0
202 stars 51 forks source link

Spaces near quotes etc. #109

Closed rick1906 closed 11 months ago

rick1906 commented 2 years ago

Looks like added spaces are missing in diff output.

$htmlOld = 'He said:"OK!"';
$htmlNew = 'He said: "OK!"';
$htmlDiff = new \Caxy\HtmlDiff\HtmlDiff($htmlOld, $htmlNew);
echo $htmlDiff->build();

prints He said:"OK!" while I expected highlighted space after colon. Using v0.1.14.

SavageTiger commented 2 years ago

I can confirm this. It seems to be a regression between version 0.1.10 and 0.1.11

berarma commented 2 years ago

Still happening.

I came across a similar issue with spaces.

Comparing these two strings:

<strong>this </strong>is<strong> a string.</strong> <strong>this</strong>is<strong>a string.</strong>

shows no differences.

berarma commented 2 years ago

I've investigated the issue and it seems the code intentionally ignores inserted/removed spaces. It makes sense for spaces between block elements, or where text is not expected like between </li> and </ul>. But they shouldn't be ignored when they're inside block and inline elements that accept text as content.

Ignoring spaces based on the HTML context doesn't seem easy. I guess older versions chose to not ignore any spaces, thus they would show changes where they shouldn't.

Maybe running an HTML parser that would remove invisible spaces, like the ones used for indentation, before running HtmlDiff would be the easiest and cleanest way. This could be a requirement for using HtmlDiff. It would make it easier because then we wouldn't need to ignore any spaces.

AliSheikhDev commented 1 year ago

I continue to experience the same problem even after encoding spaces to UTF-8. I attempted various strategies, including replacing spaces with a custom tag, but the comparison still does not recognize the custom tag as a difference.

jschroed91 commented 11 months ago

The pull request has been merged with a potential solution for this, although it's not all-encompassing -- see notes on PR #111 for details on the new config option and caveats with it