css4j / echosvg

SVG implementation in the Java™ Language, fork of Apache Batik, supporting level 4 selectors and colors.
Apache License 2.0
40 stars 2 forks source link

Check for the correct SVGDOM implementation version in Transcoder #68

Closed carlosame closed 1 year ago

carlosame commented 1 year ago

Currently, in TranscoderInput the possibility exists to pass a SVG 1.2 input document contained in a SVGDocument coming from SVGDOMImplementation (1.1) instead of the correct SVG12DOMImplementation (1.2). This is presumably the source of issues like #3.

I'm currently working with the transcoder stuff and perhaps it would be convenient to check for the actual SVG version of the SVGDocument and use the 1.2 factory where convenient (unless KEY_DOM_IMPLEMENTATION was explicitly set to a 1.1 SVGDOMImplementation).

This implies removing the current default value for KEY_DOM_IMPLEMENTATION which would now be null, otherwise I cannot know whether a 1.1 implementation was specified purposely.

I've already implemented it and everything seems fine. Any thoughts @DaveJarvis? Anyone would have objections to this change being committed?

ghost commented 1 year ago

https://github.com/DaveJarvis/keenwrite/blob/main/src/main/java/com/keenwrite/preview/SvgRasterizer.java

The KEY_DOM_IMPLEMENTATION isn't directly used in my project. A few related ideas:

The commit isn't tagged in this issue. Mind adding adding a link to the commit hash?

carlosame commented 1 year ago

The KEY_DOM_IMPLEMENTATION isn't directly used in my project. A few related ideas:

Yes but KEY_DOM_IMPLEMENTATION, by default, has a 1.1 factory. My proposal is to remove that default.

  • Being able to remove BufferedImageTranscoder would be great. This implies that ImageTranscoder would no longer write to standard error when an exception (unhandled or otherwise) is encountered. That is, deprecate and/or remove ImageTranscoder::displayError.

Not sure how you would avoid having BufferedImageTranscoder. You mean deprecating UserAgent.displayError? That's how processing by the user agent is supposed to work, I don't see how this could ever be deprecated.

  • Most SVG documents that don't have a version specified should presumably be parsed as version 1.1. If an unknown tag is encountered, perhaps the document can be re-parsed as 1.2.

By unknown tag you mean 1.2? Version 1.2 tags inside 1.1 documents are already tolerated.

If an unknown tag is still encountered, a key to force ignoring unknown tags and continue parsing/rasterizing could be useful.

That should be happening if you do not force validation, but is outside of the scope of this bug.

Having a way to retrieve a list of errors encountered while rasterizing would be useful (rather than having to trap individual exceptions).

That's where you have to subclass UserAgent.

The commit isn't tagged in this issue. Mind adding adding a link to the commit hash?

See PR #70.

ghost commented 1 year ago

Right, BufferedImageTranscoder would need to stay. The default error handler is where the issue resides:

https://github.com/css4j/echosvg/blob/2112c113b23be8f8ad6ecf13830b87e38dff3ddd/echosvg-transcoder/src/main/java/io/sf/carte/echosvg/transcoder/DefaultErrorHandler.java#L40

Libraries should not write to standard error or standard output. (Applications use standard output for their own purposes, such as creating CSV files or other structured text formats. A third-party library that dumps its output to stdout would invalidate the application's output. KeenWrite, for example, can write XHTML documents to stdout and so I had to override this behaviour to avoid creating malformed XHTML documents. The DefaultErrorHandler violates the Principle of Least Surprise by writing to stdout [and stderr].)

By unknown tag I meant an unknown XML tag that's neither SVG 1.1 nor SVG 1.2.

The PR looks okay. Could consider .htm filename extensions, as well. Also, I didn't see any case conversion, so hopefully the algorithm allows for .HTML, .Html, .Xhtml, and other variations.

carlosame commented 1 year ago

Libraries should not write to standard error or standard output. (Applications use standard output for their own purposes, such as creating CSV files or other structured text formats. A third-party library that dumps its output to stdout would invalidate the application's output. KeenWrite, for example, can write XHTML documents to stdout and so I had to override this behaviour to avoid creating malformed XHTML documents. The DefaultErrorHandler violates the Principle of Least Surprise by writing to stdout [and stderr].)

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

By unknown tag I meant an unknown XML tag that's neither SVG 1.1 nor SVG 1.2.

If you find problems with a non-validating parser/transcoder, please open an issue.

The PR looks okay. Could consider .htm filename extensions, as well. Also, I didn't see any case conversion, so hopefully the algorithm allows for .HTML, .Html, .Xhtml, and other variations.

The extensions are only for the testing infrastructure: The library uses KEY_DOCUMENT_ELEMENT (that you must set to html) and KEY_DOCUMENT_ELEMENT_NAMESPACE_URI (also set to the XHTML ns uri).

It should not even need that (I personally find somewhat silly that one must know the document element and the NS URI in advance, just to parse a document from a stream), but it's how it always has worked and we aren't going to change it now.

People hate changes to the old Batik API even when there are good reasons for it.

ghost commented 1 year ago

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

That's appeal to tradition, which isn't a great reason to keep poor decisions around. I don't know how many people are using EchoSVG at the moment, but I'd change it and let people override to add writing to standard output if they needed it. My guess is that they'd be in the vast minority, but without a survey, it's anyone's guess.

carlosame commented 1 year ago

The solution is to override that behaviour, as you did. It has always worked this way, and it is likely that quite a few people would become upset if it changed.

That's appeal to tradition, which isn't a great reason to keep poor decisions around. I don't know how many people are using EchoSVG at the moment, but I'd change it and let people override to add writing to standard output if they needed it. My guess is that they'd be in the vast minority, but without a survey, it's anyone's guess.

There is no "appeal to tradition" when we are talking about a software that was written about 20 years ago. With legacy software one talks about backward compatibility.

ghost commented 1 year ago

It's a toughie, that's for sure. For me, I'd prefer erring on the side of not introducing latent bugs in unsuspecting developers' code. :-) Like there's nothing that tells any developer that their stderr/stdout streams could be corrupted by this library. Maybe consider adding a warning somewhere in the README.md file, then?

(Again, I wouldn't be overly concerned with being 100% backwards compatible with Batik in this particular case, because they got this one wrong, unfortunately.)