ezyang / htmlpurifier

Standards compliant HTML filter written in PHP
http://htmlpurifier.org
GNU Lesser General Public License v2.1
3.07k stars 327 forks source link

False postive "x<y" #402

Open attrib opened 5 months ago

attrib commented 5 months ago

We noticed an issue with the html purifier used in SuiteCRM.

Cleaning the value of x<y results in just x, while x < y results in the correct x < y

While trying it seems like as soon as I do not have a space after < everything after it will be stripped.

I can reproduce this with the demo app, to verify its not an issue directly with SuiteCRM - http://htmlpurifier.org/demo.php?filter%5BAutoFormat.AutoParagraph%5D=0&filter%5BAutoFormat.DisplayLinkURI%5D=0&filter%5BAutoFormat.Linkify%5D=0&filter%5BAutoFormat.RemoveEmpty.Predicate%5D=colgroup%3A%0D%0Ath%3A%0D%0Atd%3A%0D%0Aiframe%3Asrc%0D%0A&filter%5BAutoFormat.RemoveEmpty%5D=0&filter%5BAutoFormat.RemoveSpansWithoutAttributes%5D=0&filter%5BNull_CSS.AllowedProperties%5D=1&filter%5BCore.CollectErrors%5D=0&filter%5BHTML.Allowed%5D=z&filter%5BHTML.Doctype%5D=XHTML+1.0+Transitional&filter%5BHTML.SafeObject%5D=0&filter%5BHTML.TidyLevel%5D=light&filter%5BURI.DisableExternalResources%5D=0&filter%5BNull_URI.Munge%5D=1&html=aaa+x%3Cz+sdgdfg&submit=Submit

Input

test x<y test

Output

test x

Expected

text x<y test

Options

Not sure which filter gets this, so here the full config

        $config = \HTMLPurifier_Config::createDefault();

        $baseConfigs = [];
        $baseConfigs['HTML.Doctype'] = 'XHTML 1.0 Transitional';
        $baseConfigs['Core.Encoding'] = 'UTF-8';
        $hidden_tags = array('script' => true, 'style' => true, 'title' => true, 'head' => true);
        $baseConfigs['Core.HiddenElements'] = $hidden_tags;
        $baseConfigs['URI.Base'] = $sugar_config['site_url'] ?? null;
        $baseConfigs['CSS.Proprietary'] = true;
        $baseConfigs['HTML.TidyLevel'] = 'none';
        $baseConfigs['HTML.ForbiddenElements'] = array('body' => true, 'html' => true);
        $baseConfigs['AutoFormat.RemoveEmpty'] = false;
        $baseConfigs['Cache.SerializerPermissions'] = 0775;
        $baseConfigs['Filter.ExtractStyleBlocks.TidyImpl'] = false;
        $baseConfigs['Output.FlashCompat'] = true;
        $baseConfigs['HTML.DefinitionID'] = 'Sugar HTML Def';
        $baseConfigs['HTML.DefinitionRev'] = 2;
        $baseConfigs['Attr.EnableID'] = true;
        $baseConfigs['Attr.IDPrefix'] = 'sugar_text_';

        foreach ($baseConfigs as $key => $value) {
           $config->set($key, $value);
        }
       $purifier = new \HTMLPurifier($config);

      echo $purifier->purify('test x<y test') . "\n";
bytestream commented 5 months ago

test z<y test is not valid HTML. Wrapping it with a doctype, html, body, etc will ensure it's processed correctly. The Core.ConvertDocumentToFragment may also work...