dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

wait_for_workers got stuck when to create cluster but application failed on yarn #146

Open FANLONGFANLONG opened 3 years ago

FANLONGFANLONG commented 3 years ago

I put 500 GB for a worker accidentally to start yarn cluster. But looks likely

  1. I can create cluster(application) on yarn because i got cluster.app_id and cluster.scheduler_address.

  2. i use client.wait_for_workers to wait for worker ready

  3. but dask got stuck. image

  4. i checked application on yarn and it is failed already image

the issue is reproduced.

May I know what I could do to avoid the problem?

fjetter commented 3 years ago

This sounds like a yarn specific problem and I suggest to repost this on the issue tracker of https://github.com/dask/dask-yarn

jrbourbeau commented 3 years ago

Just transferred this issue over to the dask-yarn repo

FANLONGFANLONG commented 3 years ago

@jrbourbeau thanks