Open scharlottej13 opened 2 years ago
Thank you @scharlottej13 for opening this issue and starting the conversation.
Currently, the tutorial is a bit long, it's taking between 1.5-2h to complete and there are certain topics that can probably be removed to make the material easier to chew.
Since this tutorial is targeted as introductory I think we can remove:
cc: @pavithraes @rrpelgrim as you have taught similar tutorials, what's your experience with these topics, is there anything else that might be confusing/advanced for beginners?
Thanks for starting this conversation @scharlottej13!
I agree with @ncclementi, I think the tutorial atm tries to be too exhaustive and complete (lots of task graphs, starting off with Delayed, etc.) which can be intimidating for novice users. I think we should move towards 'wow-ing' people with the power of Dask first...and only then explaining how it works.
The analogy we're using in evangelism atm is that we want to show people a shiny race car, get them to step in and take it for a test drive (no mechanic skills or understanding of the inner workings of the engine needed here), and then be super impressed by the results. If at that point people are like - "Hey, how does this actually work?" or "Hey, can I take out the engine and build my own car/hovercraft/spaceship?"...then we can dive into that.
With that in mind, what I've been doing is:
-- move to notebooks --
Start with a quick flashy 'showing off' of the various Dask race cars: Dask.dataframe to scale pandas, Dask.array to scale numpy, Dask.ml to scale sklearn and a very quick sneak-peak into the engine with a simple dask.delayed example (to tease any intermediate/expert users in the room) -- ~10 minutes
Then jump into the Dask.dataframe and take it for a test drive. Show them how to move from pandas to Dask and how to control that car (API, etc.) -- ~20 minutes
Then jump into the Dask-ML car and take it for a drive -- ~15 minutes
Then say - "Cool stuff, right? Do you want to know how this works?" and talk a little more about delayed -- ~10 minutes
Skip Schedulers and Futures
Q&A
My alternative layout of the notebooks lives here for now, but would like to synthesise efforts and end up with a set of 'master' notebooks and slides in this coiled/dask-mini-tutorial
repo that we can then fork whenever we give a presentation.
https://github.com/coiled/coiled-resources/tree/main/dask-tutorial/notebooks
My non-code slides live here -- these need iteration, not totally convinced by my own narrative line on this one: https://docs.google.com/presentation/d/1BMhxuTuOg1jRYFANDvbb-GNszpyH-JKPKnbGEsO5GtQ/edit?usp=sharing
We should also refresh the longer data-science-at-scale
tutorial with some of these messaging strategies in mind.
curious what @MrPowers thinks based on his meetup experiences.
Opening up this issue to start a conversation on things that can be cut and/or improved (per @ncclementi's suggestion!)