dask / community

For general discussion and community planning. Discussion issues welcome.
19 stars 3 forks source link

How worker memory is distributed in dask-yarn while using EMR #262

Closed nilanjanroy1 closed 2 years ago

nilanjanroy1 commented 2 years ago

I am trying to use dask in EMR cluster. I was able to spin up a dask cluster and perform some dask operations using the link https://yarn.dask.org/en/latest/aws-emr.html# My question is how dask-yarn allocates the physical memory of the EMR.

Lets say I have a cluster with 1 scheduler and 2 workers (8cores 32gb ram ), how to determine how much memory my dask-workers are getting.

for eg, when i use cluster.scale(2) is my workers getting 32gb ram each. similarly, if i use cluster.scale(4) is my workers getting 16 gb each. ignoring the overhead memory above.

Any piece of code to check the worker memory. I am using steps in EMR to submit my .py file. Any leads would be helpful, thanks.

jacobtomlinson commented 2 years ago

This repo isn't really the appropriate place for questions like this. This is for discussion of community topics rather than usage help.

Would you mind opening this on the forum instead? https://dask.discourse.group

nilanjanroy1 commented 2 years ago

Sure @jacobtomlinson . Thanks