coiled / dask-mini-tutorial

This repo contains a short version of a dask tutorial.
BSD 3-Clause "New" or "Revised" License
12 stars 13 forks source link

Ideas for improvement #6

Open scharlottej13 opened 2 years ago

scharlottej13 commented 2 years ago

Opening up this issue to start a conversation on things that can be cut and/or improved (per @ncclementi's suggestion!)

ncclementi commented 2 years ago

Thank you @scharlottej13 for opening this issue and starting the conversation.

Currently, the tutorial is a bit long, it's taking between 1.5-2h to complete and there are certain topics that can probably be removed to make the material easier to chew.

Since this tutorial is targeted as introductory I think we can remove:

cc: @pavithraes @rrpelgrim as you have taught similar tutorials, what's your experience with these topics, is there anything else that might be confusing/advanced for beginners?

avriiil commented 2 years ago

Thanks for starting this conversation @scharlottej13!

I agree with @ncclementi, I think the tutorial atm tries to be too exhaustive and complete (lots of task graphs, starting off with Delayed, etc.) which can be intimidating for novice users. I think we should move towards 'wow-ing' people with the power of Dask first...and only then explaining how it works.

The analogy we're using in evangelism atm is that we want to show people a shiny race car, get them to step in and take it for a test drive (no mechanic skills or understanding of the inner workings of the engine needed here), and then be super impressed by the results. If at that point people are like - "Hey, how does this actually work?" or "Hey, can I take out the engine and build my own car/hovercraft/spaceship?"...then we can dive into that.

With that in mind, what I've been doing is:

  1. Start with a no-code slide Deck to build intuition and excitement around what Dask is and the problems it can solve for you -- ~10 minutes

-- move to notebooks --

  1. Start with a quick flashy 'showing off' of the various Dask race cars: Dask.dataframe to scale pandas, Dask.array to scale numpy, Dask.ml to scale sklearn and a very quick sneak-peak into the engine with a simple dask.delayed example (to tease any intermediate/expert users in the room) -- ~10 minutes

  2. Then jump into the Dask.dataframe and take it for a test drive. Show them how to move from pandas to Dask and how to control that car (API, etc.) -- ~20 minutes

  3. Then jump into the Dask-ML car and take it for a drive -- ~15 minutes

  4. Then say - "Cool stuff, right? Do you want to know how this works?" and talk a little more about delayed -- ~10 minutes

  5. Skip Schedulers and Futures

  6. Q&A

My alternative layout of the notebooks lives here for now, but would like to synthesise efforts and end up with a set of 'master' notebooks and slides in this coiled/dask-mini-tutorial repo that we can then fork whenever we give a presentation. https://github.com/coiled/coiled-resources/tree/main/dask-tutorial/notebooks

My non-code slides live here -- these need iteration, not totally convinced by my own narrative line on this one: https://docs.google.com/presentation/d/1BMhxuTuOg1jRYFANDvbb-GNszpyH-JKPKnbGEsO5GtQ/edit?usp=sharing

We should also refresh the longer data-science-at-scale tutorial with some of these messaging strategies in mind.

curious what @MrPowers thinks based on his meetup experiences.