eerohele / dita-ot-gradle

A Gradle plugin for running DITA Open Toolkit
https://github.com/eerohele/dita-ot-gradle/
Apache License 2.0
13 stars 6 forks source link

Using the dita-ot-gradle plugin as the basis for a web service #6

Closed DavidPickles closed 8 years ago

DavidPickles commented 8 years ago

I have a question about the use of your plugin.

I am looking at ways to expose DITA-OT through a web-service. (Every request will be for the same transtype, but the input ditamap and topic files will be different.) In the short term load will be quite light, but we need to have a roadmap for providing scalability as the load increases.

Our as-is solution works by starting a JVM and kicking-off Ant for every request. Not surprisingly it takes quite a lot time.

I've used your plug-in on my local machine to do the same DITA-OT processing, and with the Gradle daemon active seen really significant performance improvements. So this looks promising as the basis for an alternative to our current way of doing things on the server.

My question is whether this sounds like a reasonable approach to providing a web service. Are there any particular issues or risks that you think this approach would have that we should be aware of?

Thanks in advance for any thoughts you might have about this.

eerohele commented 8 years ago

Interesting idea!

First off, I would recommend perusing the documentation for Gradle Daemon. Every caveat that applies to the Gradle Daemon naturally applies here, too.

There are some issues related to DITA-OT that you are probably already aware of that come to mind (no thread-safety, high memory consumption with large documents, etc.), but the first thing I can think of that pertains to this use case is that as the documentation for the Gradle Daemon says, a daemon instance dies after 3 hours of inactivity. So if your service has long periods of inactivity, you won't get much benefit out of the plugin, because it'll have to create a new daemon instance every time someone publishes something.

Looking at the Gradle sources, it looks like there's a system property called org.gradle.daemon.idletimeout that you might be able to use to control the lifespan of the daemon, but I don't see that property documented anywhere, so it might be internal to Gradle and therefore subject to change.

On a somewhat related note, when I initially created this plugin, it didn't have to rely on any internal Gradle code. However, later DITA-OT added a dependency that's a different version of a library the Gradle runtime itself uses, which caused a conflict that forced me to basically rewrite the plugin to rely on a couple of internal Gradle classes in order to retain the benefit of using the Gradle Daemon.

That is not ideal and can lead to issues such as #5 (which you reported) when updating the Gradle version on the server. So, I would recommend testing each new Gradle version carefully before deploying into production. If there are any errors, you can naturally post an issue here and I'll try to get it fixed.

Also, the Gradle Daemon might slightly increase memory usage on your server, but I don't think it should be too much of an issue.

Other than that, I don't know that anyone has used the plugin like this before, so you might bump into issues that I can't predict, of course. Again, though, if you post an issue, I'll try to help out.

DavidPickles commented 8 years ago

Thank you very much for you response - a lot of food for thought.

As I understand it, the basic issue is that DITA-OT is written using a technology - Ant - that, although a great tool, is not designed to support long lasting, multi-threaded, performance optimised, server processes. The issue is compounded by the use of static variables in some of the key DITA-OT classes, meaning that they are not thread-safe.

I'm starting to have some doubts about my original idea that Gradle and your plug-in could provide the solution to this. From a high level, the reason is that Gradle, just like Ant, although a great tool, is not primarily designed to support long lasting, multi-threaded, performance optimised, server processes. It takes a few steps in that direction with the daemon. But there's a lot that is unclear. To begin with although the daemon does take steps towards the destination I'm interested in, there isn't much evidence that that's a goal of the Gradle team themselves. Nor what proportion of that journey these steps really represent.

All of this adds a context to the detailed concerns that you highlight: the lack of configurability of the daemon lifespan, and the lack of a clean way to handle conflicts between framework and project classpaths. And they make me think I'm likely to hit other serious problems as I try to pull Gradle down a road it perhaps doesn't really want to walk.

For these reasons I need to look for viable alternatives. It may be that despite my reservations, Gradle + your plugin does turn out to be better than other options, but my next move is to define those other options in order to make that comparison.

eerohele commented 8 years ago

That's a good and fair summary. I know @jelovirt has been working on a DITA-OT server component that would probably be just the thing you're looking for, but I don't know what its status is. You could get in touch with him if you're interested.

On the topic of parallelism, @jelovirt has also written a brief article on it you might be interested in.

EDIT: On a related note, Gradle does offer a Tooling API that might alleviate some of the problems with the Gradle Daemon in your scenario, but I haven't had the time to look into it yet. Also, I'm not sure that will necessarily help with the classloader conflicts.