css4j / echosvg

SVG implementation in the Java™ Language, fork of Apache Batik, supporting level 4 selectors and colors.
Apache License 2.0
40 stars 2 forks source link

Lenient parsing #113

Closed zbynek closed 2 months ago

zbynek commented 2 months ago

This library correctly recognizes invalid SVG syntax, but in some cases it would be useful to ignore the violations.

carlosame commented 2 months ago
* With SVG 1.2 invalid `<rect>` dimensions no longer throw exceptions. It would be consistent with [this change ](https://github.com/css4j/echosvg/commit/dd391552671c1b5a9d5cb64f3d449a989103678b) to also not throw exceptions for invalid `width` and `height` of `<svg>` itself.

After today's commit, invalid values for x, y, width and height in any element shouldn't be a problem. A few issues remain with other attributes, but better integration with CSS is required before fixing (due to inheritance). So the Typed OM patch must be merged before doing anything.

* To be consistent with browser implementations of SVG rendering, it would be good to avoid throwing exceptions for invalid paths, such as
<svg width=20 height=10>
  <path d="M 0 0, 10 10, 20 - 1" stroke="#000"/>
</svg>

(both Firefox and Chrome accept all the path points until the first invalid one)

This one is in the radar, in fact it is the cause of the incorrect rendering of one of the Canvg images (and I have a unit test prepared somewhere). Fixing this is not easy and AFAIR shall require a full refactor of the way that paths are parsed.

carlosame commented 2 months ago

To be consistent with browser implementations of SVG rendering, it would be good to avoid throwing exceptions for invalid paths,

I modified the code so rendering continues normally after a missing or malformed d attribute, with the correct behavior for the graphics nodes. It was much simpler than what I said, but at the cost of sometimes reporting the same issue multiple times (one of the tests reports 324 errors, the actual errors being much less).

My question is: does it make sense to report an error (that can/should be ignored by your error handler) for missing attributes, as EchoSVG does now? Should the missing attributes be ignored silently? Should I add a reportWarning() method to the user agent, and report those as a warning?

Feedback is welcome.

zbynek commented 2 months ago

Thanks for the fast fixes! I saw that invalid SVG size is no longer blocking the parsing with 1.2.1.

I think it makes sense to handle all the errors using the same method, complex behaviors like "show at most X errors of type Y" can be implemented by the user agent. That being said I would expect just 1 error for each malformed d attribute.

at the cost of sometimes reporting the same issue multiple times

Is that because the same d attribute is parsed multiple times?

carlosame commented 2 months ago

Thanks for the fast fixes! I saw that invalid SVG size is no longer blocking the parsing with 1.2.1.

Yes, and it has other compliance fixes like for example in SVG embedded into HTML you can omit the width attribute if you have a viewBox and a height attribute. The viewBox aspect ratio is used to compute width (MermaidJS for example uses this sometimes).

I think it makes sense to handle all the errors using the same method, complex behaviors like "show at most X errors of type Y" can be implemented by the user agent. That being said I would expect just 1 error for each malformed d attribute.

According to SVG2 it isn't an error and should just be ignored by agents. But people may use EchoSVG for SVG linting, so they expect it to report something (that's why I asked for feedback). The user agent lacks a displayWarning() so I was using displayError(), but it makes little sense to report an error which actually isn't one so I'm adding a displayWarning().

Is that because the same d attribute is parsed multiple times?

Yes, because the attribute is marked as invalid and then is retried on each pass (and also due to animations).

carlosame commented 2 months ago

The patch is merged and now EchoSVG does lenient parsing of paths, as well as polygon and polyline points. I created a 1-stable branch which contains this fix, so you won't have to wait until 2.0 to have it. EchoSVG now passes the relevant WPT test, as well as other tests that I prepared.

I couldn't fix the issue that when a path is referenced by a use element, the attributes are reparsed (and if there is an error, it is reported multiple times), which was already happening before my changes. I even attempted to trigger the parsing while the fragment is being imported, then copy the paths to the imported tree and mark the new paths as valid (this is the reason why the path lists gained a mysterious copyTo() method), but no luck. I'll keep trying.

I'm closing this, if you have a specific proposal about lenient parsing that isn't fixed in master please open a new issue.

carlosame commented 2 months ago

The issue that errors and warnings were reported multiple times is fixed now, I'll merge the patch soon.

carlosame commented 2 months ago

All of the fixes commented here are shipped in EchoSVG 1.2.2. Quoting from the Release Notes:

Paths and point sets are now only parsed once, even if referenced by multiple <use> elements, which fixes an important performance bottleneck.

And the lenient parsing generates the correct bounding boxes even after modification by scripts. The following is an image that had missing or invalid d and points attributes, but then a script fixes that:

Scripts fixes paths, polygons and polylines that had no valid d or points attributes

The bounding boxes are the correct ones, both before and after the Javascript modified the document. The following is the opposite case, where a script causes the elements to not render, or render partially:

The paths and polylines are valid, but then a script breaks them partially or totally

In the case of ellipses, circles and rectangles, when the radius or width/height are zero they do not render, yet still contribute to the bounding box as points (the specification says nothing concrete, and technically they are points).

Reopening for a few days, in case anyone wants to comment.

zbynek commented 2 months ago

With 1.2.2 the only SVGs in our test suite that throw an error on loading are the ones referencing external resources, which is perfect for our use-case, thanks!

(If you decide to report external resource loading to user agent instead of throwing exceptions, we should be able to easily update our code though.)