MakeWellFormed strategy when attempting to fix invalid markup messes it up even more. Please consider the following test script:
<?php
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
echo $purifier->purify('<p><i><ul><li>text</li></ul></i></p>');
one would expect the output:
<p><i></i></p><ul><li>text</li></ul>
or ideally:
<p><i></i></p><ul><li><i>text</i></li></ul>
Instead we get:
<p><i></i></p><i>text</i>
Tested against HTMLPurifier 4.12.0.
By doing some digging I found that setting $formatting property to false on <i> element definition in the Presentation module helps a little - the <ul> structure is retained. The drawback of this is that the carrying <i> element no longer works.
A better fix will probably be to use the PH5P parser which will get you more HTML5-compliant parsing. I don't intend to fix up MakeWellFormed to make it closer to HTML5 behavior, it's an evolutionary dead end.
MakeWellFormed strategy when attempting to fix invalid markup messes it up even more. Please consider the following test script:
one would expect the output:
or ideally:
Instead we get:
Tested against HTMLPurifier 4.12.0.
By doing some digging I found that setting
$formatting
property tofalse
on<i>
element definition in the Presentation module helps a little - the<ul>
structure is retained. The drawback of this is that the carrying<i>
element no longer works.This suggests that the tree-fixing algorithm in HTMLPurifier_Strategy_MakeWellFormed::execute() requires some tuning.