ezyang / htmlpurifier

Standards compliant HTML filter written in PHP
http://htmlpurifier.org
GNU Lesser General Public License v2.1
3.07k stars 327 forks source link

Registering URI Filters fails: broken polimorphic logic calling pre/post methods. #183

Closed metaturso closed 5 years ago

metaturso commented 6 years ago

Affected version and environment

The problem: Trying to register a URI filter with the following code fails. Could be a bug, a documentation problem, or PEBKAC. I'm seeking clarification.

class MyCustomUriFilter extends HTMLPurifier_URIFilter { ... }

// ...
$config->set('Filter.Custom', [new MyCustomUriFilter]);
$p = new HTMLPurifier($config);
$html = $p->purify('...');

The error message:

Error: Call to undefined method @project\Sanitiser\MyCustomUriFilter::preFilter()
@folder/ezyang/htmlpurifier/library/HTMLPurifier.php:202

From this document I understand one must extend the class HTMLPurifier_URIFilter. It's unclear if the name of this subclass MUST be prefixed by HTMLPurifier_URIFilter_ or whether NameOfFilter MAY just suffice.

Secondly, by looking at the control and data flow in HTMLPurifier::purify() it doesn't look like there's any type checking around filters.

In my case this has been causing my subclass of HTMLPurifier_URIFilter -- whose interface lacks both preFilter and postFilter -- to trigger an error because the polymorphic filter logic ends up calling a non-existing method on some object.

From HTMLPurifier::purify with my own comments.

      // [...] 
      // Making a copy of the custom filters and holding it separately for some reason.
      $custom_filters = $filter_flags['Custom'];

      // [...]
      // Avoid custom filters for now, it's assumed they're already instantiated.
      unset($filter_flags['Custom']);
        $filters = array();
        // (non-URI) Filter instantiating loop
        foreach ($filter_flags as $filter => $flag) {
            if (!$flag) {
                continue;
            }
            if (strpos($filter, '.') !== false) {
                continue;
            }
            $class = "HTMLPurifier_Filter_$filter";
            $filters[] = new $class;
        }

        // Adding custom filters back into $filters.
        // This is where we break things, since $filters now holds objects across two disjoint type hierarchies
        foreach ($custom_filters as $filter) {
            // maybe "HTMLPurifier_Filter_$filter", but be consistent with AutoFormat
            $filters[] = $filter;
        }

        // Merging filters with defaults?
        $filters = array_merge($filters, $this->filters);

        // WTH?! Calling preFilter indiscriminately on mixed instances of HTMLPurifier_Filter and possibly HTMLPurifier_URIFilter?
        for ($i = 0, $filter_size = count($filters); $i < $filter_size; $i++) {
            $html = $filters[$i]->preFilter($html, $config, $context);
        }

Based on the above I need some help to understand what went wrong here, and decide what steps I can take next to work around this problem.

  1. Should the "URI Filters" document be updated or possibly deprecated, or
  2. Should the HTMLPurifier::purify() code to check types and not to rely excessively on polymorphic behaviour, or
  3. Should the class HTMLPurifier_URIFilter subclass HTMLPurifier_Filter perhaps, or
  4. Should a different method for registering URI Filters be documented?
ezyang commented 5 years ago

Yeah, your problem is that you're not adding the filter to HTML Purifier correctly. The docs do actually say how to do it correctly:

$uri = $config->getDefinition('URI');
$uri->addFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>(), $config);</pre>

To answer your questions directly:

  1. No, the doc is correct, read the section "Activating your filter"
  2. I don't think it would be a bad idea to add some more typechecking. Happy to accept a patch which, for example, ensures that all filters in Filter.Custom are HTMLPurifier_Filter
  3. No. they're totally disjoint.
  4. Yes. Which is in the doc.
metaturso commented 5 years ago

Thank for your reply.

How would you feel about losing the need for extension classes to comply with a naming scheme? I bet many people like myself find this requirement fairly irritating when attemtping to integrate HTMLPurifier in a namespaced PHP codebase.

ezyang commented 5 years ago

I'd accept a patch on that, but I'm not exactly sure what a clean way to go about doing it is.