dask / community

For general discussion and community planning. Discussion issues welcome.
20 stars 3 forks source link

Developer documentation #190

Open fjetter opened 2 years ago

fjetter commented 2 years ago

In an off-line discussion about technical debt and code complexity the valid concern was raised that many of our internal systems are not properly documented.

One example that came up is the current/new state machine (https://github.com/dask/distributed/issues/4413 https://github.com/dask/distributed/pull/5046) which is documented to some extend (https://distributed.dask.org/en/stable/scheduling-state.html and https://distributed.dask.org/en/stable/worker.html#internal-scheduling) but likely not sufficiently for another developer to make educated judgment calls about code changes.

I would like to collect topics, mostly for dask/dask and dask/distributed where more extensive developer documentation would help either onboarding new developers or help existing developers to familiarize themselves with other areas of the code.

cc @jcrist @jrbourbeau @gjoseph92 @ncclementi

jcrist commented 2 years ago

Thanks for opening this @fjetter!

A few topics that come to mind:

jacobtomlinson commented 2 years ago

I would add implementing Cluster classes to that list. Maybe custom adaptive classes too.

GenevieveBuckley commented 2 years ago

High level graphs are another area that have been mentioned as needing better developer docs. There is a tracking issue here: https://github.com/dask/dask/issues/7755

fjetter commented 2 years ago

Disk spilling/memory management. When does data move on the worker, and how is this configured?

https://distributed.dask.org/en/stable/worker.html#memory-management

Is this sufficient? Should I create a ticket to restructure/move this?

fjetter commented 2 years ago

I created dedicated issues for the topics you mentioned. We can move the discussion about the individual items to the respective tickets.

Apart from further collecting topics, I would be curious about how we want to structure these new or already existing sections. I already realized, while researching the topic on our current docs, that some of the information asked here is already partially documented under "Developer Documentation" while other are in "Build understanding". This might be a judgement call for individual topics but if there are general best practices to follow, this can be discussed here as well.