Closed avinashraghuthu closed 6 years ago
I have checked the code and it is being intentionally done
If you can fix this, we would certainly accept a PR.
However, you should be aware that a new ground-up rewrite is underway in the new project skein. Although it is only in beta, it should solve problems such as this, and provide the adaptive functionality for dask-on-yarn that you are after.
I have commented the code where the application master has been removed and tested the knit application along with the autoscaling.It worked fine and it spawned workers when needed and removed to zero containers(excluding the application master) when idle.
I don't see a reason for knit killing the application master specifically.Because when all workers are removed then user himself will do the cleanup if he don't want the cluster running.By alone killing the application master and all, I don't see any gain.If you are fine with this I will raise a PR for the same.
If you want me to keep the clean up(i.,e application master) but still handle it separately for the autoscaling case then I will do that.
Please suggest!
It sounds like both your solution to prevent the app-master from quitting, and your adaptive layer are both useful additions for this that are still using knit, would be happy to see either or both in a PR.
Raised a PR for the same
Thanks @martindurant for merging the PR.Closing the issue
@martindurant Can I know when this can be released in a version? Because right now this is not coming in 0.2.4 while trying to install via pip
You can always install from master pip install git+https://github.com/dask/knit
Yes, I totally forgot about that one. Thanks 👍. Are you thinking of releasing any new version sooner ?
Not particularly; as you will have seen, there is not much development now on here, since the push to create skein.
I am trying to run the dask yarn cluster with auto scaling i.e., I have used Adaptive class , which automatically triggers the scaling down and up of cluster.But the problem is when all the workers have been removed the master application is also getting killed.So when again scale up function has been called , the addition of the containers is not happening. I have checked the code and it is being intentionally done.But that will not help for me in case of automatic scaling where cluster can have zero worker containers when it is idle and workers will be added when a job need to be executed Please find the sample test I am using ClusterScaleTest.txt