InformaticsMatters / squonk

Squonk platform and computational notebook
Apache License 2.0
8 stars 3 forks source link

Provide a mechanism to place Nextflow pods on specific (k8s) nodes #132

Open alanbchristie opened 3 years ago

alanbchristie commented 3 years ago

At the moment, apart from namespace, we're unable to control where Pods are scheduled by Nextflow. Nextflow does offer a pod declaration in the process that is documented as being available in the k8s process declaration. Importantly the pod allows for the specification of a nodeSelector (the most basic part of scheduling), where Pods will only run on nodes with a given label. So we could have nextflow nodes and ensure that Pods run on those nodes.

Sadly the nodeSelector doesn't allow complex scheduling or support taints/tolerations to create exclusive nodes but it might be neough to provide better Pod scheduling - i.e. where nextflow processes only in on named nodes.

Basic idea

The basic idea is to expose a SQUONK_POD_NEXTFLOW_NODE_SELECTOR environment variable (default being application - our default node label) with the value passed to the Kubernetes Pod via the addExtraNextflowConfig() method call in in the OpenShiftRunner.

See

tdudgeon commented 3 years ago

Whilst some of this is Nextflow specific the general principle should be applicable to any container - we should be able to target a container to a node that can run in any AZ (e.g. not need EBS volumes) and be tolerant of the node disappearing (e.g. as it might be a spot instance). An obvious example is the Docker based pipelines.

The other class of containers are ones that need a resource that is tied to a particular AZ (e.g. an EBS volume) or cannot tolerate being relocated so easily. These would be targeted to nodes in a particular AZ that are not spot instances. An example would be the PostgreSQl database that serves squonk and keycloak.