dask / community

For general discussion and community planning. Discussion issues welcome.
20 stars 3 forks source link

Moar Plugins #205

Open mrocklin opened 2 years ago

mrocklin commented 2 years ago

There are a few really fun plugins for Dask

These are easy to write and really impactful. They're also an easy way for people to get involved without touching the beating heart of Dask internals. This might be a fun way to get peripheral folks engaged. I was speaking with a colleague and found that he had a few other fun ones like hooking up logging or print statements to the dask dashboard. We've listed a few in the docs. I think that @crusaderky mentioned something in the AMM docs around adding better GC. We could add the malloc trim trick there as well.

There is a lot of potential here, and it's a good way to showcase how pluggable and hackable Dask is. These could also be featured in docs. So I'll suggest a few steps:

  1. Bring what we have in docs into the codebase (distributed maybe, or should these be in a dask-contrib package?)
  2. Add a docs section that includes each
  3. Host a small one-day sprint where we
    • First seed a set of ideas (smaller group)
    • Then bring on some folks who are not as familiar with the scheduler/workers to help implement them (or come up with their own
crusaderky commented 2 years ago

I think that @crusaderky mentioned something in the AMM docs around adding better GC. We could add the malloc trim trick there as well.

That was an early idea about periodically calling the malloc_trim C function on the workers. However @fjetter had empirical experience about it causing hard to debug segfaults. The idea was discarded in favour of setting an env variable for stdlib to read before starting the worker.

mrocklin commented 2 years ago

This repository of plugins would not be on by default. People might choose to use them optionally. I think that starting out we would want to be more permissive than typical in order to get a good set in there.

On Thu, Nov 11, 2021 at 5:31 AM crusaderky @.***> wrote:

I think that @crusaderky https://github.com/crusaderky mentioned something in the AMM docs around adding better GC. We could add the malloc trim trick there as well.

That was an early idea about periodically calling the malloc_trim on the workers. However @fjetter https://github.com/fjetter had empirical experience about it causing hard to debug segfaults. The idea was discarded in favour of setting an env variable for stdlib to read before starting the worker.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dask/community/issues/205#issuecomment-966228553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFTNPV7ORYBKMB66YLULOSSPANCNFSM5HZERBQA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

fjetter commented 2 years ago

I'm not too concerned about the malloc trim anymore. We've had this discussion several times and I don't want to forbid a potentially game changing feature based on some anecdotal evidence. If we're hit by it, we'll keep this in mind but I think we should move on. We are now setting the MALLOC_TRIM_THRESHOLD_ which should do the same. However, I think I've heard users complain that this isn't having the same impact as an explicit trim. If a plugin helps and people can opt-in, that would be nice.

mrocklin commented 2 years ago

I'm mostly suggesting that we get more comfortable including optional plugins, and then showcasing these plugins in documentation. The optional, and default-off nature of these plugins should lower our standards for inclusion into the codebase. I also think that it's good to include lots of things, mostly so that people can see a gallery of what is possible.

jsignell commented 2 years ago

This is a cool idea and sounds like it should be a new dask-contrib repo to me

martindurant commented 2 years ago

Plus, plugins make for great, easy contrib packages. We could consider making a skeleton for it.

martindurant commented 2 years ago

(sorry @jsignell , repeating you!)

mrocklin commented 2 years ago

We've added several plugins to the distributed codebase (UploadFile, UploadDirectory, PipInstall, ...) and these don't seem to have caused any issues. Having plugins like these be automatically available if someone installs Dask is nice. Having to go find lots of little libraries is easier on developers but probably harder on users, especially more novice users.