ezyang / htmlpurifier

Standards compliant HTML filter written in PHP
http://htmlpurifier.org
GNU Lesser General Public License v2.1
3.02k stars 323 forks source link

The behavior for blank node parsing changed in later versions of PHP. #403

Closed charlie-curtis closed 2 months ago

charlie-curtis commented 2 months ago

Description

When upgrading from PHP8.1.22 to PHP8.3.4, there is inconsistent output between versions -- specifically for how blank nodes are handled.

Example

Input

<table>
    <caption>
        Cool table
    </caption>
    <tfoot>
    <tr>
        <th>I can do so much!</th>
    </tr>
    </tfoot>
    <tr>
        <td style="font-size:16pt;
      color:#F00;font-family:sans-serif;
      text-align:center;">Wow</td>
    </tr>
</table>

PHP8.1.22 output

<table><caption>
        Cool table
    </caption>
    <tfoot><tr><th>I can do so much!</th>
    </tr></tfoot><tr><td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
    </tr></table>

PHP8.3.4 output

<table>
    <caption>
        Cool table
    </caption>
    <tfoot>
    <tr>
        <th>I can do so much!</th>
    </tr>
    </tfoot>
    <tr>
        <td style="font-size:16pt;color:#F00;font-family:sans-serif;text-align:center;">Wow</td>
    </tr>
</table>

Impact

A strong case can be made that the PHP8.3.4 output is "more correct", and I wouldn't argue. The issue is that there is a ton of existing code and applications that maybe relying on the old behavior in order to "work". Having an optional backwards-compatible solution would ease the transition as many upgrade beyond PHP8.1.

Investigation

These steps have been performed:

I think this php-src commit changed the default behavior of "blank" parsing from "don't keep" to "keep".

Suggested Fix

Much like LIBXML_PARSEHUGE is an optional configuration value that can be supplied here, I propose adding LIBXML_NOBLANKS as an optional value in order to better handle backwards compatibility as mentioned above without impacting existing use cases.

Similar issues

https://github.com/ezyang/htmlpurifier/issues/237 https://github.com/ezyang/htmlpurifier/issues/269

ezyang commented 2 months ago

Sgtm send the pr

charlie-curtis commented 2 months ago

@ezyang thanks, the PR is here: https://github.com/ezyang/htmlpurifier/pull/404