Open DamienIrving opened 3 years ago
On the debugging side of things, it would be worth adding the progress bar to the lesson:
import dask.diagnostics
dask.diagnostics.ProgressBar().register()
In order to do this we'd need to explain the difference between a local (or single-machine; default) scheduler and a distributed scheduler, because the tools you use for profiling are different for each. I think this distinction is well worth explaining.
https://docs.dask.org/en/stable/diagnostics-local.html
https://docs.dask.org/en/stable/scheduling.html
This script also shows how to use the resource profiler: https://github.com/climate-resilient-enterprise/workflows/blob/master/cmdline_programs/return_period.py
At my 2021 Dask Summit presentation about teaching Dask to atmosphere and ocean scientists it was suggested that content could be added about the Dask task graph and debugging / best practices for finding pain points.
It was suggested that this PyData talk might be useful: https://www.youtube.com/watch?v=JoK8V2eWFPE