daisy / pipeline-framework

Core projects for the DAISY Pipeline 2 runtime framework
9 stars 2 forks source link

More implementations of script and job API #189

Open bertfrees opened 1 year ago

bertfrees commented 1 year ago

Now that the job and script APIs have been reworked and made more generic (XProc agnostic), the opportunity is there to add other backends.

1. Pipeline 1 backend

A first idea is to add a Pipeline 1 backend. This would provide an alternative to porting scripts from Pipeline 1 to Pipeline 2 and would allow discontinuing the Pipeline 1 GUI and give access to Pipeline 1 scripts in the new Pipeline 2 GUI.

2. Web server backend

A second idea is to add a backend that dispatches jobs to one or more Pipeline web servers. It is a combination of two older idea's and one fresh use case:

  1. Unified Java API. The idea, originally launched by Jostein, has been around for a while to unify the "direct" Java API and the Java client library API to connect to a server over HTTP. The benefit is that users can easily change between the two methods to call Pipeline 2, and that we would uncomplicate and eliminate our code base.

  2. Scaling. The idea to be able to scale Pipeline came from MTM and they have probably implemented something for their specific needs already (but haven't shared it). My idea for addressing this request was to have a web server that would connect and dispatch to multiple other web servers and that would manage the pool of servers based on the load and would also do load balancing based on the load of each server.

  3. A new use case is a project I’m doing that uses Pipeline as a Java library and that needs to run jobs one at a time (one job per invocation of the tool), but ideally without the overhead of starting and stopping the engine each time. The solution could be to fire up a web server that will keep running in the back, also when the process ends, and connect to it each time. There are however a number of hurdles with this and I would much prefer if there was a reusable component that would do it for me:

    • The Java API (after the rework) is more streamlined and rich than the Java client APIs (both the pipeline-clientlib-java/pipeline-clientlib-httpclient and pipeline-clientlib-jaxb versions).
    • As a user I don't want to be bothered with managing the web server process. It could be done automatically behind the scenes.
    • I don't even need to know that the implementation is based on a HTTP server. It doesn't matter how it is implemented.

The new component I'm thinking about would provide all of the above features. It would implement the script and job APIs and would have the following additional configuration parameters:

bertfrees commented 1 year ago

CC @rdeltour @josteinaj

josteinaj commented 1 year ago

Maybe this would make it possible to include the nordic scripts again as well. The new version of the nordic scripts are not run using Pipeline 2. But there's still a HTTP API, so that could be invoked in the background.

For the nordic scripts we're planning to "wrap" other APIs when validating books. By that, I mean that we combine validation reports from different tools so that you don't have to invoke them all separately.

We always validate according to the nordic guidelines. But we also currently include Epubcheck and Ace in the Docker image, and use those for additional validation. But I'd like those to run as separate Docker containers (preferably official Docker images) and invoked through an API. It would make it easier to upgrade to newer versions of Ace and Epubcheck without creating a new version of the nordic validator.

Similarly we would have other organization-specific or shared validators that could be plugged in. For instance a MathML validator and a audio book validator, which could run as separate Docker containers and be developed as separate projects.

egli commented 1 year ago

I'm not totally sure what you're asking for with this issue. I guess you're throwing out ideas to get feedback.

bertfrees commented 1 year ago

personally I'm more interested in a complete REST API

Are there things missing from the REST API in your opinion?

As far as I understand there aren't that many Pipeline1 scripts that we really want to port, so I think we should port them properly and in the future only have one way to invoke scripts.

I agree and in the long run this is still the goal. The problem is that we've been saying for years that we are going to port scripts, but it doesn't happen (or very slowly). Lack of resources, lack of incentives, lack of help from the community, ... the reasons vary. Also, unless we do the porting very carefully, with maximum reuse of existing code, there is the risk that we make things worse from a maintenance point of view. We already have a lot of conversion code that is not uniform. (Despite the efforts we put into it this remains the case.) Without uniformity in the existing code base, it's easy to make things worse when you add new code.

Do you want load balancing?

Yes, but load balancing is just one of the things I mentioned. I want the ability to connect to a web server (that was possibly fired up automatically) from Java. This opens up several possibilities. Load balancing is one part.

Then why don't you use an off-the-shelf proven existing load balancer? [...] it sounds like you are trying to implement something which might end up like a poor implementation of Kubernetes

I'm not saying I want to do things from scratch. Looking at Kubernetes for the scaling part is a great idea. Perhaps Kubernetes can take care of everything we want for scaling so that we only need to connect to one server. Or perhaps we need some more logic on the Java end. The crucial point in my proposal is that I want a unified (REST and Java) API so that people can easily switch to a different backend if they want to optimize or scale their application, without the added complexity and duplication of effort.

egli commented 1 year ago

Are there things missing from the REST API in your opinion?

No, I'm happy with the REST API. I'm just saying that for me the REST API is more important than the Java API.

I agree with the rest of your argument.