Open mccheah opened 7 years ago
kubeadm is still alpha. Using the annotation to mark master nodes as unschedulable is subject to change. I'll wait for it to stabilize before adding any additional filtering.
@foxish is there harm in filtering out the various draft versions of "unschedulable" that are being used -- at least the one kubeadm uses at present? I'd like to aim for Spark to work well out of the box on most all k8s clusters regardless of the various differences in how master node tainting works at present.
I'd like to aim for Spark to work well out of the box on most all k8s clusters regardless of the various differences in how master node tainting works at present.
Makes sense. I'm waiting for one discussion here https://github.com/kubernetes/kubernetes/pull/39112 to complete. Following that, (and before our alpha), I'll capture all the various ways to detect a master node.
I think this is taken care of? Anything else to add here @foxish ?
@mccheah Still some work to be done here, around finding the right annotations allowing us to filter out master nodes for clusters brought up using kops, minikube, kubeadm, etc. There's a lot of discussion around this area. I'll have a PR out to fix this in a day or two.
Ah yes, mixed it up with the filtering of IP types from the nodes. Yes, this does seem beneficial to sort out for the first pass.
@foxish maybe for the alpha we can provide docs for how to set the right taint on the master node that the fabric8 client looks for?
Even if the value isn't standardized right now, we're relying on one in particular so should make that clear.
I ask this also with self-interest because I've been waiting through failovers in my testing ever since we introduced this change and I'd love to set the taint right on my dev cluster.
Based on what I'm seeing in https://github.com/kubernetes/kubernetes/pull/31647, the right annotation to set on the node is:
annotations:
scheduler.alpha.kubernetes.io/taints: '[{"key":"node.alpha.kubernetes.io/ismaster","effect":"NoSchedule"}]'
It is still experimental and subject to change, but will likely unify in the future.
Is that the key that fabric8 looks for in this call? https://github.com/apache-spark-on-k8s/spark/blob/k8s-support-alternate-incremental/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/NodePortUrisDriverServiceManager.scala#L52
That's the unschedulable
I'm hoping to provide docs for
The old way of marking a node as unschedulable was:
(1)
apiVersion: v1
kind: Node
...
spec:
unschedulable: true
The new method of achieving the above is:
(2)
apiVersion: v1
kind: Node
metadata:
annotations:
scheduler.alpha.kubernetes.io/taints: '[{"key":"node.alpha.kubernetes.io/ismaster","effect":"NoSchedule","timeAdded":null}]'
fabric8 client supports (1) through https://github.com/apache-spark-on-k8s/spark/blob/k8s-support-alternate-incremental/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/NodePortUrisDriverServiceManager.scala#L52. It has no notion or support for (2).
What I think we should do is:
spec.unschedulable
is unset, users can manually mark their nodes as unschedulable by editing the node spec to match (1).Does that make sense?
69 filters the nodes to contact for submitting the application to those which are not marked as "unschedulable". However we have found that in a few cases this condition alone may not filter out all the nodes that are not running the kube-proxy. We should look into the full set of conditions to check there.