dask / dask-kubernetes

Native Kubernetes integration for Dask
https://kubernetes.dask.org
BSD 3-Clause "New" or "Revised" License
312 stars 148 forks source link

Introduce typehints in controller #881

Closed jonded94 closed 5 months ago

jonded94 commented 7 months ago

General code quality improvements for the controller part of this repository.

jonded94 commented 7 months ago

@jacobtomlinson I converged to a somewhat rounded midpoint now; there still are a handful of mypy errors in controller.py (test_controller.py now shows no errors even with mypy --strict), but solving these isn't too easy right now.

Let me know what you think about my PR in general; it is unfortunately quite large, but I did it to improve developer experience (I honestly found contributing to this repository a bit hard since it is almost impossible to know which types are used where). I would prefer not increasing the scope of this PR even further for now but to do only clean up.

Also, I did not include mypy in CI for now since there are still hundreds of errors because of other parts of the codebase of this repo. Let me know what you think about that.

Maybe I can find some time to also add type hints to other parts of this repo in the distant future.

jonded94 commented 7 months ago

@jacobtomlinson could you help me out with this piece of code here:

In the top-level __init__.py the KubeCluster is defined under __all__, but it is actually not imported "properly" but instead a __getattr__ is defined that is doing a manual importlib.import_module pointing to dask_kubernetes.classic.KubeCluster: https://github.com/dask/dask-kubernetes/blob/main/dask_kubernetes/__init__.py#L22

This class actually is very much different from the KubeCluster class defined here which I used in all my projects.

Why is there a differentation, why is the default pointing to the "classic" KubeCluster and why is it done in this way? This smells very anti-pattern, but maybe I'm just not seeing something important here.

The CI errors actually stemmed from me trying to importing KubeCluster from the place which I thought to be "right", e.g. the dask_kubernetes.operator.kubecluster one.

jacobtomlinson commented 7 months ago

@jonded94 it was done this way when we deprecated the classic KubeCluster implementation.

In the past users would do from dask_kubernetes import KubeCluster. But when we rewrote things with the operator we pushed things down into submodules so you would import KubeCluster from either dask_kubernetes.classic or dask_kubernetes.operator. The getattr was added so we could tell people to migrate over.

Enough time has passed now that we can do away with all of this. The classic codebase can be totally removed. Then we need to make a decision about should we allow from dask_kubernetes import KubeCluster to import the operator version of the class, or should we enforce everyone to do from dask_kubernetes.operator import KubeCluster. This is mainly to avoid very old codebases breaking.

jonded94 commented 7 months ago

Okay, so the idea would be to include a DeprecationWarning in the future then as soon somebody is importing that classic thing, got it (because right now it simply imports it with no further warning or output). This is something separate then from this PR :) Thanks for your clarification!

jacobtomlinson commented 7 months ago

Yeah, it was actually there previously, but we moved it again because there were some usage patterns that meant it didn't show up.

https://github.com/dask/dask-kubernetes/blob/2a0465273f0ce539e8405c4a4eb1d25e8f92c3d8/dask_kubernetes/classic/kubecluster.py#L455

I basically consider the classic code overdue for removal. I just haven't had the time lately to do the work. Hoping to get to it very soon.

jacobtomlinson commented 6 months ago

I took the plunge and removed the classic codebase in #890. Would you mind rebasing this PR?

jonded94 commented 6 months ago

Very nice to hear! :) Rebased my PR.

jonded94 commented 5 months ago

I now fixed the remarks, i.e.:

After merging this MR, the next step surely would be to introduce mypy in the CI, at least for dask_kubernetes.operator.controller? (since that now shows 0 errrors)