Closed xemlock closed 5 years ago
I'm extremely unlikely to fix bugs in PH5P. If you can find another HTML5 complaint HTML parser we can swap in instead of PH5P that would be the best way to go.
Hi, @ezyang Yes, that's perfectly understandable. Especially when PH5P is marked in the source as experimental. I just wanted to raise the fact that it cannot be used as a replacement for other parsers (especially when dealing with HTML5 tags), and need to be used with caution.
Anyway, I think this issue can be closed as a Won't Do, as other lexer implementations (DOMLex
and DirectLex
) are good enough.
Even if HTML5 sectioning elements (
section
,nav
,article
,aside
,header
,footer
) are added to HTML definition, they are silently removed from the output.Looks like it's because the code responsible for parsing them is in WIP state: https://github.com/ezyang/htmlpurifier/blob/master/library/HTMLPurifier/Lexer/PH5P.php#L2788
Minimum snippet to reproduce this:
The result is:
If you comment out the line that sets lexer (or use any other built-in lexer), the result is correct: