galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Cloud support #241

Closed innovate-invent closed 1 year ago

innovate-invent commented 3 years ago

Currently Pulsar does not support scheduling jobs in a cloud environment. I am targeting Kubernetes but plan to move to Nomad and Docker Swarm.

In order to support a cloud environment, Pulsar needs to be modified to function as a job sidecar.

I am planning on a k8s, docker and nomad runner plugin to expand the configuration of the submissions to these cloud managers.

Some changes are needed to Pulsar before I can begin said changes:

I am trying to follow the logic of the code but am not entirely sure if the change I have in mind is enough. I think the submit_job function needs to have all of the data staging pulled into a sub function and conditionally call it if sidecars are enabled.

The code trails off into the preprocess_and_launch function that calls a subthread, and I am not sure what is occurring here. I suppose this issue is a request for documentation more than anything.

jmchilton commented 3 years ago

Currently Pulsar does not support scheduling jobs in a cloud environment.

This is a very rude way to express a perceived technical limitation after I explained that my understanding was that the request was possible.

I understand the documentation is lacking and the code is hard to follow but we're literally using Pulsar in multiple cloud environments. Pulsar absolutely can be used with Kubernetes and can stage data in a pulsar container (that you may call a sidecar if you'd like) beside a biocontainer (or an explicitly defined tool container) that runs the job in the same pod. It was presented a year and a half ago (https://static.sched.com/hosted_files/gcc2019/10/S8A-1__John_Chilton_remote_data.pdf) and Galaxy has test cases that demonstrate doing this (https://github.com/galaxyproject/galaxy/blob/dev/test/integration/test_kubernetes_staging.py).

innovate-invent commented 3 years ago

It isn't clear what Pulsar is doing exactly. In the presentation it appears that it is attempting to drive the k8s pod using a WDL workflow engine. The test_kubernetes_staging.py script is very difficult to follow. Looking at everything, it isn't clear how to execute Pulsar as anything but a daemon. Could you briefly walk me through the process Pulsar uses to execute a job as a sidecar? Starting from the Pulsar daemon receiving the job from Galaxy.

innovate-invent commented 3 years ago

I just discovered https://github.com/galaxyproject/pulsar/blob/68a52084468b48a447a4849498d30ac69b805900/pulsar/client/client.py#L395

It appears that Pulsar starts a sidecar daemon that then waits for the job script in the tool container to kick off. What is the logic behind this rather than just executing a stage script within a container, then when it exits, execute the tool container within the pod?

I suppose I was expecting something along the lines of this:

apiVersion: batch/v1
kind: Job
metadata:
  name: pulsar-job-123456
spec:
  template:
    spec:
      initContainers:
      - name: stage
        image: pulsar
        command: ["stage.py",  "--job", "123456"]
      - name: tool
        image: biocontainers/tool_container
        command: ["bash",  "job_script.sh"]
      containers:
      - name: cleanup
        image: pulsar
        command: ["cleanup.py",  "--job", "123456"]
      restartPolicy: Never