Closed catzc closed 3 weeks ago
Thanks for raising this. Internally kr8s
uses box
to make accessing nested data structures more pleasant and pythonic. At one point early on we were using the camel killer box to make keys like loadBalancer
more pythonic by converting them to snake case like load_balancer
.
>>> import box
>>> data_in = {"loadBalancer": True}
>>> data = box.Box(data_in, camel_killer_box=True)
You can access the loadBalancer
key using either .loadBalancer
or .load_balancer
. Which is very pleasant.
>>> data.load_balancer
True
>>> data.loadBalancer
True
Unfortunately this operation in box
is destructive and modifies the underlying data structure to store data in the snake case form. Which means when converting back to a dictionary you don't get the same structure that you put in.
>>> data.to_dict()
{'load_balancer': True}
Given that kr8s
needs to be able to update data back to the Kubernetes API we can't be mangling the data structures like this. So at some point we dropped the camel killer box.
It looks like this bit of code where we look up the load balancer info is still using the snake case and needs updating to camel case.
I've opened up #907 to resolve this here. @blankhold could you confirm that these are the changes you made that fixed things for you? It's a little hard to test this fix as we can't provision loadbalancers easily in our test suite.
Awesome, thanks for the explanation and PR @jacobtomlinson!
Seems like there are a few more patches I had to make. I pulled in the changes from your PR and also the changes you recently made in kr8s https://github.com/kr8s-org/kr8s/commit/665ff53e95c1808e80fb038e3f9ed2e7858275de.
Some other things:
Startup still gets stuck in _wait_for_controller.
asyncio.sleep(5)
then immediately return, skipping the check. (For some reason the status in the daskcluster is never being updated to phase: "Running"). Doing an immediate return will give us a BoxKeyErrorlb.hostname also gives a BoxKeyError
default_box_none_transform
for Box or refactoring that section With those changes above, I was able to get a daskcluster with LoadBalancer service and access the dashboard externally
Ok great! Perhaps it's easier if you put up a PR that supersedes mine that makes the changes you need and we can discuss specifics there.
Thanks! Created https://github.com/dask/dask-kubernetes/pull/908 which I think should fix the issues. Also, know when kr8s can make a new release to pick up your recent changes when we release a new version of dask-kubernetes? Or how will the release go so that we can finally pick up these changes once merged?
Didn't realize how to test the controller before, so my getting stuck in wait_for_controller commend might not be applicable/might just be fixed when rebuilding the controller to test. Will let you know if that worked
I've just released kr8s
version 0.17.4
and dask-kubernetes
version 2024.9.0
with these fixes in.
Describe the issue: I'm trying to create a KubeCluster with AKS LoadBalancer, but the DaskCluster is stuck in a pending state at startup.
Minimal Complete Verifiable Example:
Trying to create a cluster like this:
but the DaskCluster is stuck in Pending State until it times out
Anything else we need to know?:
I looked around at our service spec and it seems like our resource definition has our load balancer under a key "loadBalancer" when the scheduler service comes up (as referenced in kubernetes doc https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) while dask-kubernetes seems to look around for "load_balancer" in status: https://github.com/dask/dask-kubernetes/blob/2ecfdcd41b0e09665d4ff98cb40528218fea10fe/dask_kubernetes/operator/controller/controller.py#L403-L406
for what it's worth, the kr8s library also seems to look around for "load_balancer".
When I switched these over to using "loadBalancer" the DaskCluster was able to come up without any issues. Actually even when I didn't change the code, the service, pods, deployments etc were all able to be setup and in a running state, only the DaskCluster was pending.
Any helps is great, thanks!
Environment: