Master Application getting closed upon all workers removed

dask / knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

http://knit.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

53 stars 10 forks source link

Master Application getting closed upon all workers removed #129

Closed avinashraghuthu closed 6 years ago

avinashraghuthu commented 6 years ago

I am trying to run the dask yarn cluster with auto scaling i.e., I have used Adaptive class , which automatically triggers the scaling down and up of cluster.But the problem is when all the workers have been removed the master application is also getting killed.So when again scale up function has been called , the addition of the containers is not happening. I have checked the code and it is being intentionally done.But that will not help for me in case of automatic scaling where cluster can have zero worker containers when it is idle and workers will be added when a job need to be executed Please find the sample test I am using ClusterScaleTest.txt

martindurant commented 6 years ago

I have checked the code and it is being intentionally done

If you can fix this, we would certainly accept a PR.

However, you should be aware that a new ground-up rewrite is underway in the new project skein. Although it is only in beta, it should solve problems such as this, and provide the adaptive functionality for dask-on-yarn that you are after.

avinashraghuthu commented 6 years ago

I have commented the code where the application master has been removed and tested the knit application along with the autoscaling.It worked fine and it spawned workers when needed and removed to zero containers(excluding the application master) when idle.

I don't see a reason for knit killing the application master specifically.Because when all workers are removed then user himself will do the cleanup if he don't want the cluster running.By alone killing the application master and all, I don't see any gain.If you are fine with this I will raise a PR for the same.

If you want me to keep the clean up(i.,e application master) but still handle it separately for the autoscaling case then I will do that.

Please suggest!

martindurant commented 6 years ago

It sounds like both your solution to prevent the app-master from quitting, and your adaptive layer are both useful additions for this that are still using knit, would be happy to see either or both in a PR.

avinashraghuthu commented 6 years ago

Raised a PR for the same

https://github.com/dask/knit/pull/130

avinashraghuthu commented 6 years ago

Thanks @martindurant for merging the PR.Closing the issue

avinashraghuthu commented 6 years ago

@martindurant Can I know when this can be released in a version? Because right now this is not coming in 0.2.4 while trying to install via pip

martindurant commented 6 years ago

You can always install from master pip install git+https://github.com/dask/knit

avinashraghuthu commented 6 years ago

Yes, I totally forgot about that one. Thanks 👍. Are you thinking of releasing any new version sooner ?

martindurant commented 6 years ago

Not particularly; as you will have seen, there is not much development now on here, since the push to create skein.