ezyang / htmlpurifier

Standards compliant HTML filter written in PHP
http://htmlpurifier.org
GNU Lesser General Public License v2.1
3.02k stars 323 forks source link

Wrong output when Core.CollectError is disabled #366

Closed asubaruwrxsti closed 1 year ago

asubaruwrxsti commented 1 year ago

If you use this input: 'xf2n*7&#&Qm&JU2', the ecoding will be incorrect if the 'CollectErrors' are No. The '#' within '&' will be lost.

This occurred in version 4.13 and 4.15. The workaround is to turn on CollectErrors. This is the Test Unit.

    /**
     * Method vtlib_purifyProvider
     * params
     */
    public function vtlib_purifyProvider() {
        return array(
            array('xf2n*7&#&Qm&JU2',false,'xf2n*7&#&Qm&JU2','special &#&'),
        );
    }

    /**
     * Method testvtlib_purify
     * @test
     * @dataProvider vtlib_purifyProvider
     */
    public function testvtlib_purify($input, $ignore, $expected, $message) {
        $actual = vtlib_purify($input, $ignore);
        $this->assertEquals($expected, $actual, "testvtlib_purify $message");
    }

This is the output with CollectErrors On :

PHPUnit 9.5.8 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 00:00.310, Memory: 28.00 MB

OK (1 test, 1 assertion)

This is the output with CollectErrors Off:

1) VtlibUtilsTest::testvtlib_purify with data set #0 ('xf2n*7&#&Qm&JU2', false, 'xf2n*7&#&Qm&JU2', 'special &#&')
testvtlib_purify special &#&
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'xf2n*7&#&Qm&JU2'
+'xf2n*7&Qm&JU2'

FAILURES!
Tests: 1, Assertions: 1, Failures: 1.
bytestream commented 1 year ago

CollectErrors changes the lexer from DOMLex (DOMDocument) to DirectLex so that's why there's a difference. xf2n*7&#&Qm&JU2 isn't valid HTML though, &#& is an incomplete entity.