darylldoyle / svg-sanitizer

A PHP SVG/XML Sanitizer
GNU General Public License v2.0
454 stars 67 forks source link

Sanitizer::removeRemoteReferences doesn't remove remote images #94

Open irauta opened 9 months ago

irauta commented 9 months ago

When sanitizing an image like the example below, a href attribute pointing to a remote resource in an image element does not get removed, when $sanitizer->removeRemoteReferences(true); has been called before sanitizing:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 370 500" width="370" height="500">
  <rect x="5" y="5" width="350" height="490" style="fill: #FF6347; stroke: #000000; stroke-width: 10px;" />
  <image x="20" y="10" width="320" height="480" href="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Gustav_chocolate.jpg/800px-Gustav_chocolate.jpg" />
</svg>

I narrowed this down to the hasRemoteReference method of Sanitizer. (Link points to the 0.16.0 version, but the code in master branch is currently identical.) There is an early return in the method, introduced by PR #15, which seems to cause that potentially only attributes that are along the lines of fill="url('http://trackingsite.tld/image.png')" are removed, but the more straightforward references like the cat image from Wikimedia Commons in the example above are left alone.

I tried commenting out the middle part of the hasRemoteReference method, leaving only the first and last line active:

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 370 500" width="370" height="500">
  <rect x="5" y="5" width="350" height="490" style="fill: #FF6347; stroke: #000000; stroke-width: 10px;"></rect>
  <image x="20" y="10" width="320" height="480"></image>
</svg>

I haven't tested older versions of svg-sanitizer, but looking at the code history it seems like this roughly corresponds to the way the method originally worked.

To me removing hrefs that point to remote resources seems like the intended functionality when removeRemoteReferences is enabled, but it's also not how the code currently works. I'm not suggesting to just remove the code that removes url() references - handling them is clearly important too - but am not sure what would be the best change to hasRemoteReference as I don't understand why the early return was added.