dask / dask-blog

Dask development blog
https://blog.dask.org/
30 stars 35 forks source link

Blogpost idea: how to choose good settings for Dask on HPC #118

Open GenevieveBuckley opened 2 years ago

GenevieveBuckley commented 2 years ago

It'd be good to have a blogpost about how to choose good settings for Dask on HPC. Users are often confused about this.

I think one reason this is particularly confusing is that settings often need to be defined in multiple locations, and people are confused about how they interact. For example, someone might submit a job to SLURM with sbatch, which then runs a python program involving Dask, and want to know how that fits together.

https://github.com/dask/dask-blog/issues/116#issuecomment-947370655

...you know what would ALSO be a good blogpost? How to choose good cluster settings. Eg: how your SLURM/PBS/whatever batch submission settings relate to the settings you need to put in your dask-jobqueue cluster object.

To be honest I'm still a bit confused by this, and it is something other people ask me too.

If either @jacobtomlinson or @ian-r-rose would like to help make this, that would be very useful to refer people to (hint, hint) :smile:

@guillaumeeb has kindly agreed to help put this together https://github.com/dask/dask-blog/issues/116#issuecomment-947955079

Hi all, I saw this issue, and I agree that both ideas would make great articles. Those are questions we see a lot as HPC admin/experts.

I can try to help with the second one one batch submission settings! Everyone is confused about it.

GenevieveBuckley commented 2 years ago

These resources don't necessarily answer the question about how to choose good settings, but might be good to link to:

It'd be good to collect other, non-SLURM links too

guillaumeeb commented 2 years ago

Thanks @GenevieveBuckley for starting the discussion.

In my experience, the thing users have the most difficulties to understand is how to configure the JobQueueCluster (be it PBS, Slurm or whatever) correctly, and what do the kwargs mean. More specifically:

I think one reason this is particularly confusing is that settings often need to be defined in multiple locations, and people are confused about how they interact

With this, there is also the dask-config Yaml file vs the kwargs. Which to use and when?

For example, someone might submit a job to SLURM with sbatch, which then runs a python program involving Dask, and want to know how that fits together.

I agree, we need also to describe different possibilities and "big picture":

And we could also add improvements to be made, or point to https://blog.dask.org/2019/06/12/dask-on-hpc which presents a lot of things that are still true. And maybe try to develop point 7, at the end of the post.

GenevieveBuckley commented 2 years ago

That is an excellent and thorough summary @guillaumeeb!

We also might add:

how to configure the JobQueueCluster (be it PBS, Slurm or whatever) correctly, and what do the kwargs mean.

Building on "what do the kwargs mean", it would be good if we could not only explain each concept, but also map it to the words used for the same concept in other places. Suggesting this because it's the type of question I get - someone has read all the beginner documentation and asks "Is $foo the same as $bar? Does that mean I should set these values to the same thing?"