SciDAS / nextflow-api

MIT License
31 stars 5 forks source link

Use nextflow -bg kuberun to reduce load on webserver pod #29

Closed bentsherman closed 4 years ago

bentsherman commented 4 years ago

Sometimes a workflow launched via nextflow-api will fail but leave no error in the workflow log or .nextflow.log. Sometimes the workflow log will have the failed to fetch pod exit status, which leads me to think that the nextflow-api pod loses connection with the submitter pod.

I'm also looking at example right now where the workflow "failed" according to nextflow-api but the submitter pod is still running just fine! So I think something is happening between the webserver pod and submitter pod.

In any case, we should be able to get around this issue by using nextflow -bg kuberun, which should cause the nextflow process to exit on the webserver while the actual workflow runs on a different pod. We'll have to refactor the various API endpoints to use kubectl instead of the pid to manage running workflows.

bentsherman commented 4 years ago

In a discussion with @cbmckni I think we determined that this approach would not be worth it. If you don't have nextflow kuberun running on the server then the server has to call kubectl periodically to fetch the workflow log and workflow status. Since nextflow kuberun is already doing that for us, might as well let it.

Also, now that we're envisioning nextflow-api as a single-user service, I am less concerned about one user running too many pipelines at once. And if they do, they can increase the server's capacity by providing more CPU cores.