ezyang / htmlpurifier

Standards compliant HTML filter written in PHP
http://htmlpurifier.org
GNU Lesser General Public License v2.1
3.03k stars 323 forks source link

Excluded elements from <pre> tag #312

Open lbfeenix opened 2 years ago

lbfeenix commented 2 years ago

Hello, I came up with an interesting "bug" that the purifier deletes the contents of nested elements inside the 'pre' tag. I found this in file /library/HTMLPurifier/HTMLModule/Text.php between lines 60-69, $pre->excludes.. Does anyone know why? I'm mainly interested in the tags 'big', 'small' and 'font'. HTML specification does not prohibit using of these tags inside 'pre', or does it? The mentioned tags in 'pre' generally works in actual browsers and there is no reason nor for security to remove them with their content, which is probably the biggest problem.

bytestream commented 2 years ago

Don't think it's a bug, just an incomplete implementation of <pre>. See comment from when the exclusions were added:

        // SGML permits exclusions for all descendants, but this is
        // not possible with DTDs or XML Schemas. W3C has elected to
        // use complicated compositions of content_models to simulate
        // exclusion for children, but we go the simpler, SGML-style
        // route of flat-out exclusions. Note that the Abstract Module
        // is blithely unaware of such distinctions.
lbfeenix commented 2 years ago

Thanks for reply. I found commit with this comment now. Problem is not so much in removing these elements but in removing all their content. That I see like bug because part of text just disappears.