Attempting autoscaling on GKE autoscaling node pool

chrisroat commented 4 years ago

I have a graph that contains 300 tasks which take ~hour. If I start up a cluster with >300 nodes before submitting a graph, the work gets distributed nicely and the tasks run in parallel. If I cut it too close to 300 (especially if the nodes are preemptible), I will run into a few nodes getting multiple tasks. This isn't too big a deal - I give it a few extra nodes and sometimes it runs a 2 or 3 hours.

On the other hand, if I use a fresh (initial size 0) autoscaling dask cluster and set it to scale from 0-350, a lot of my long tasks pile up on individual workers. This is because K8s can take time to add nodes, and the graph starts processing as soon as 1 or 2 nodes are up. The scheduler does not know yet about the coming workers, and so a few workers get assigned many tasks. The autoscaling in this cases stops around ~30 nodes.

Is my best solution to avoid autoscaling with fresh clusters and graphs like this?

I'm using dask/distributed 2.19, and dask-gateway 0.7.1.

TomAugspurger commented 4 years ago

Typically, task stealing would handle this type of situation: https://distributed.dask.org/en/latest/work-stealing.html Do you have it disabled?

Sometimes, if the tasks take too long, the discovery of when to steal can take too long. Some groups have played with setting task-durations in the config: https://github.com/dask/distributed/blob/08d334e2e18bd977752eeab87e2c09272a2ac829/distributed/distributed.yaml#L27-L30

chrisroat commented 4 years ago

Thanks for the tip on the task duration settings.

I have not disabled work stealing. It might be interplay with the auto-scaling, as the workers aren't being created to do the stealing. I will look into specifying long task durations, as perhaps that will entice the autoscaler to add more workers.

dask / distributed

Attempting autoscaling on GKE autoscaling node pool #3942