Closed aojea closed 1 year ago
/assign @vsoch @alculquicondor
Would this mean that there can only be one pod per node?
Maybe it's worth defining a port for the container. Note that scheduler wouldn't put two pods in the same node if they use the same port, which is kind of pod anti-affinity, but it's fast for the scheduler to calculate.
Would this mean that there can only be one pod per node?
you can have many hostNetwork pods in the same node and we have the antiaffinity rules ... the question is if multiple flux agents can run in the same node
/hold
I need to revisit this, Aldo is right, it seems better that if you set hostNetwork to enforce 2 pods does not run in the same node
@aojea it looks like you added the attribute to the service container spec, and we probably want it for the MiniCluster here? https://github.com/flux-framework/flux-operator/blob/ddd1c836910d600d941f453800b6a5d06916031e/controllers/flux/job.go#L64
Also could we please add to MiniclusterSpec.Network so it’s namespaced as a networking attribute?
Finally can you explain to me what it means to use the host network / vs not? Thank you!
@aojea when you make changes, also please add this new attribute (documentation for it) in the docs/getting-started/custom-resource-definition.md. That's where the user reads about the different spec fields.
Let's draft it
@aojea if you give me permission I can make these changes to the PR.
@aojea if you give me permission I can make these changes to the PR.
Sure, of course, please go ahead
@aojea do we need to do this too? https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
I was too nervous to test hostNetwork on my actual host, so I added a test with the variable! I also found https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy and thought we might need to set the DNSPolicy to ClusterFirstWithHostNet. We can discuss / we will see what happens!
Looks like there is some issue with weight (and I'm wondering why this didn't trigger before?) I'm off to bed but will read more about this tomorrow - I put in random values for now to try out.
ClusterFirstWithHostNet
good catch, yes we do need it
1.6935564789918168e+09 ERROR Reconciler error {"controller": "minicluster", "controllerGroup": "flux-framework.org", "controllerKind": "MiniCluster", "miniCluster": {"name":"flux-sample","namespace":"flux-operator"}, "namespace": "flux-operator", "name": "flux-sample", "reconcileID": "aab05d2d-8a40-4a10-b2a4-bedf682a3e3d", "error": "Job.batch \"flux-sample\" is invalid: [spec.template.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].weight: Invalid value: 0: must be in the range 1-100, spec.template.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0].weight: Invalid value: 0: must be in the range 1-100]"}
@alculquicondor I thought the API will default it if empty
ah interesting - so with hostNetwork added, the pod is actually detecting it's hostname as minikube
. I wonder if the pod indices / hostnames are just totally gone and each pod is just minikube?
LOGS for Flux Operator Sample
flux-broker: Config file error [bootstrap]: minikube not found in hosts
flux-broker: bootstrap failed
Error: Process completed with exit code 1.
I will need to try this locally after all - will report back.
okay some updates - we will need to think through how / if this is going to work. First, the worker pods cannot schedule:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m26s (x2 over 2m28s) default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
$ kubectl get pods -n flux-operator
NAME READY STATUS RESTARTS AGE
flux-sample-0-ln79t 0/1 CrashLoopBackOff 3 (20s ago) 2m39s
flux-sample-1-rxqfm 0/1 Pending 0 2m39s
flux-sample-2-pvgg2 0/1 Pending 0 2m39s
flux-sample-3-kgbb6 0/1 Pending 0 2m39s
As for the lead broker (and the flux config) it looks like the hostnames are now just kind-control-plane
and (on here) minikube
👋 Hello, I'm kind-control-plane
The main host is flux-sample-0
So the headless service seems to be over-ridden. We do need each to have a unique hostname it can use to identity itself (and the rank) so I'm not sure this strategy will work! I'll read about other DNS options maybe. In the meantime - any ideas @aojea ?
I made this follow up PR if we want it - https://github.com/flux-framework/flux-operator/pull/205. It's exactly the same as the one here, but without host network. I actually think further testing / thinking about minimalService might be the way to go - that exposes only the broker on port 8050, and the worker pods should still have their proper DNS names.
hostNetwork means the pod is the host itself, so it inherits the node name as hostname, I think that for this operator better not use it by now, it adds an unnecessary complexity
Gotcha! I don't think it would work then, but it's a really cool idea! And actually this might be quite useful for our other work with usernetes - being able to just "use" the host network without a hitch is likely desired for specific workflows.
This allows to run Pods on the host namespaces, it is a power user option to tune directly the node and avoid the shortcoming of having to run on its own network namespaces