eerohele / saxon-gradle

A Gradle plugin for running XSLT transformations with Saxon
MIT License
16 stars 6 forks source link

Use Saxon, allow stylesheets to be URIs #24

Closed ndw closed 1 year ago

ndw commented 3 years ago

As I mentioned in the issue, this PR fixes #21

  1. I replaced XmlSlurper with Saxon, since we sort of know we're going to have Saxon available. This means we can leverage the fact that Saxon keeps track of base URIs correctly to avoid doing that work.
  2. I simplified things a little bit by just using whether or not this.xslt was successfully parsed to control later processing.
  3. I complexified things a little bit by allowing the stylesheet to be a URI, so you could point to a stylesheet on the web. I considered doing the same thing for the input, but decided against it because the odds that the input in a build is on the web is pretty small and it would have been further complexity.
  4. I adjusted the code that looks for the default extension so that it ignores named xsl:output instructions. It's a heuristic anyway, but the named ones are probably for xsl:result-documents, not the main output.
  5. I considered parsing the source document for xi:include elements and then changed my mind

Apologies if I messed up the indentation anywhere. I use 2 spaces and you use 4, but I tried to be consistent with your style.

I couldn't work out how to integrate into your tests. I did write my own set of tests against a separate gradle file, but I haven't included those. I could make them available if you'd like.

eerohele commented 3 years ago

Thank you! I'll look this over when I have the time.

I think adding Saxon as a dependency isn't a big leap since we kinda depend on it already. There's one aspect I'm wondering about, though: since users can choose to override the default Saxon version, is it possible that a user chooses a Saxon version that's incompatible with the code introduced in this PR? It's probably unlikely, but it's a nasty issue to come across if it happens.

I also wonder whether parsing includes and imports for the up-to-date check is worth it. It might be better to just to do something like project.fileTree("**/*.{xsl,xslt}") (pseudocode). After all, the point of an up-to-date check is to save time, and if we spend a lot of time parsing the input stylesheets, well...

All of this thinking is sorta half-baked, so please feel free to challenge me on all of it. :)

ndw commented 3 years ago

The code only relies on the public "s9api" APIs, so I don't think there will be any compatibility issues with Saxon 9.x and forward.

I think you might be right about the parsing, though. The fact that I was getting errors (issue #22) drove home to me that this plugin is parsing the stylesheets for every XSLT task during the configuration phase of every build. For the xslTNG build, that's hundreds of stylesheets even when none of those build targets will be run.

I've noticed that getInputFiles() is called twice, once during configuration and once when the task is actually starting. I haven't worked out why or how to tell which call is which, but I suppose the parsing could, in principle, be limited to only when the task is actually being run. But I'm not sure if that would be sufficient for Gradle's dependency tracking.

I don't think the fileTree() approach is unreasonable, but it could lead to extra work. Imagine I have a directory that contains a dozen stylesheets that don't include or import each other at all. The fileTree() mechanism would make every task that used any of those stylesheets out-of-date if any of them changed. That could be really annoying.

It might be better just to add the input document and the stylesheet to the dependencies and leave it to the task writers to add inputs.files for stylesheet modules. Observe that they'll have to do that anyway for complex documents that use, for example, XInclude, to break them into separate files. And that's at least as likely as stylesheet modules and those files are probably going to change at least as frequently.

I did an experiment where I extended the plugin to parse source documents looking for XInclude as well. It works, but for a build with several large documents, since the steps can run in parallel, it really does a lot of work and chews up a lot of memory. I concluded that was a bad idea.