dask / dask-blog

Dask development blog
https://blog.dask.org/
30 stars 35 forks source link

Dask on HPC, what works and what doesn't #5

Closed mrocklin closed 5 years ago

mrocklin commented 5 years ago

Hi All,

I'd like for a group of us to write a blogpost about using Dask on supercomputers, including why we like it today, and highlighting improvements that could be done in the near future to improve usability. My goal for this post is to show it around to various HPC groups, and to show it to my employer to motivate work in this area. I think that now is a good time for this community to have some impact by sharing its recent experience.

cc'ing some notable users today @guillaumeeb @jhamman @kmpaul @lesteve @dharhas @josephhardinee @jakirkham

To start conversation, if we were to structure the post as five reasons we use Dask on HPC and five things that could be better, what would be those five things? I think it'd be good to get a five-item list from a few people cc'ed above, then maybe we talk about those lists and I (or anyone else if interested) composes an initial draft that we can then all iterate on?

kmpaul commented 5 years ago

@jhamman can probably amend or append to this...

5 Reasons We Use Dask at NCAR

In my mind, these are the 5 best things that we get from Dask, here at NCAR:

3 Things That Could Be Better

I probably need some help developing this list, as this list is short and very self-centered. In fact, some of these things could actually already have solutions...which I would be happy to hear about.

...@jhamman, is there anything you would add?

mrocklin commented 5 years ago

Thanks for getting this started @kmpaul !

kmpaul commented 5 years ago

Oh, and I just was reminded of another "Thing That Could Be Better"...

guillaumeeb commented 5 years ago

My contribution, just some quick thoughts:

Reasons we use Dask at CNES

Things that could be better

Also 👍 to Scheduler Profiling from @kmpaul, but 👎 on Batch Launching that is already covered by Dask IMO

guillaumeeb commented 5 years ago

And I'm also very interested in contributing into the blog post.

There may also be ideas to take in @jhamman post that you already relayed on dask-blog: https://blog.dask.org/2018/10/08/Dask-Jobqueue.

kmpaul commented 5 years ago

@guillaumeeb I'd be interested to hear how you solve the Batch Launching problem. And, if you feel it is a solved problem, I'm obviously happy to take it off the list (and learn something myself!).

guillaumeeb commented 5 years ago

Maybe I'm missunderstanding your point, but isn't dask-mpi just there for batch launching dask applications? Happy to discuss on gitter about this in order to not pollute this issue.

kmpaul commented 5 years ago

Sure. Let's move to gitter.

guillaumeeb commented 5 years ago

@kmpaul just convinced me that dask-mpi was not yet sufficient to implement correctly batch launching, so in the end 👍 to improved batch launching too!

mrocklin commented 5 years ago

I'm happy to start writing up a skeleton draft of this if people are interested.

Alternatively, it would still be good to get thoughts from others. I wonder if now that AGU is over folks like @rabernat or @jhamman have time. I'm also interested in thoughts from non-earth-scientists like @lesteve and @ogrisel if they have time to list some general thoughts.

guillaumeeb commented 5 years ago

ping also @willirath and @apatlpo.

dharhas commented 5 years ago

Apologies on the delayed response. @guillaumeeb list is actually pretty spot on. Very interested on the heterogeneous resource launching and improved batch launching.

willirath commented 5 years ago

Reasons we use Dask at GEOMAR

(In addition to virtually all of the above)

Easy to teach: When training people in using Dask, it's very easy to expose them to exactly the fraction of the API that is necessary for the task at hand.

Things that could be better

Heterogeneous clusters: Both, making them easier to launch and having a simple way of associating built-in methods of Dask arrays with resources would be great.

mrocklin commented 5 years ago

I've added a quick draft here: https://github.com/dask/dask-blog/pull/6

Help filling things in there would be welcome.