Kozea / CairoSVG

Convert your vector images
https://courtbouillon.org/cairosvg
GNU Lesser General Public License v3.0
773 stars 151 forks source link

Certain svgs cause cairosvg to hang (perhaps stuck in an infinite loop) #382

Open PWhiddy opened 1 year ago

PWhiddy commented 1 year ago

Thanks for making this super useful library!

Rendering certain svgs with an invalid path causes the library to get stuck in an infinite loop. This svg renders fine in other applications, and cairosvg renders other svgs for me no problem.

Hoping for some insight on this, even being able to timeout/error on these would be helpful!

PWhiddy commented 1 year ago

Simple reproduction script:

import cairosvg

svg = """<svg xmlns="http://www.w3.org/2000/svg">
  <g>
    <path fill="#FE6502" d="200 312.1h14v14h-14z"></path>
  </g>
</svg>
"""

if __name__ == "__main__":
    cairosvg.svg2png(
        bytestring=svg.encode(),
        write_to="test.png",
        output_width=224, output_height=224
    )
PWhiddy commented 1 year ago

For anyone facing this same issue, my workaround was to modify path.py to throw an exception when a max depth is reached while parsing paths:

https://github.com/Kozea/CairoSVG/compare/master...PWhiddy:CairoSVG:master

This doesn't make it render successfully, but at least allows the error to be caught so the file can be skipped instead of hanging.

liZe commented 1 year ago

Hi!

It looks like the content of the d parameter of the path tag is wrong (or I don’t know this syntax, which is also possible because a lot of things are possible in this parameter 😁).

By design, CairoSVG is bad at handling invalid SVG images and "easily" crashes / hangs in these cases. So, if the SVG is not valid, we won’t add anything to CairoSVG to handle this case.

PWhiddy commented 1 year ago

Hey thanks for your response!

I think the design of easily crashing on invalid svgs is great, although hanging indefinitely makes things very difficult for some use cases. Particularly in a multiprocessing environment because it's not easy to gracefully timeout from outside the library itself. In my case I'm rendering a huge dataset of millions of svgs, a small number of which are invalid.

Anyway, totally respect the design of not being overly concerned with invalid inputs, but just want to say that just a simple iteration limit to detect infinite loops (perhaps as an optional argument) similar to the one I used above would make many applications much easier to build.

PS I was very impressed with the speed once I got it running. With 16 processes I can render 3600 images/second!

liZe commented 1 year ago

Anyway, totally respect the design of not being overly concerned with invalid inputs, but just want to say that just a simple iteration limit to detect infinite loops (perhaps as an optional argument) similar to the one I used above would make many applications much easier to build.

The easiest way for you is probably to define a global timeout. Using func_timeout is a great way to avoid a lot of different problems or attacks (malformed SVG, huge images, common SVG attacks…)