Make NBs production (training!) ready

hugobowne commented 4 years ago

I've got the 1st NB for this tutorial at a place I'm happy with (final Coiled error aside): https://github.com/coiled/data-science-at-scale/blob/master/01-data-analysis-at-scale.ipynb

As discussed, @davidventuri, if you could use this as inspiration for filling out text, code comments (and images, if you see fit), in the other NBs, that would be great.

The remaining NBs I would like you to prioritize in the following order:

[x] 02a-scalable-dataframes-lab.ipynb
[x] 04b-scalable-machine-learning-advanced.ipynb (feel free to add stuff from the great posts you've written with us)
[x] 03-parallelization-basics.ipynb
[x] 02b-scalable-dataframes-lab.ipynb
[ ] 03a-parallelization-basics.ipynb
[ ] 04a-scalable-machine-learning.ipynb (not necessary at the moment; only do this after everything else, time permitting)

A few things I've done here that will be needed in the other NBs:

Listing what we plan to do
Recap
Mentioning Coiled &/or Beta but not in a salesy way, merely to provide context
Enough code comments to give context but nothing over the top

Don't edit any code but do raise issues if you think there's something funky.

NBs 3a, 3b, and 4a are from here and need to be credited as such. I think you'll likely edit much of the text of them so feel free to add in them something like "This material riffs of ..."

Two more things:

[x] Could you add Coiled and Dask logos the NBs, something like here? You can find Coiled logos here.
[x] In this NB, it would be great if you could add reminders about features and target variables in ML, training and test sets, train test split, and cross validation. We won't need to much about each, just a refresher.

Feel free to use anything I've written here for this.

It may be clear, but NB4b will likely be the only ML NB and & I may drop 4a.

davidventuri commented 4 years ago

[x] Figure out/fix Coiled and Dask logos so they display side-by-side and in GitHub viewer
[x] Sprinkle recap sections throughout NB (02a and 04b)
[x] PUT NB RIFF WARNINGS AT TOP OF NB

davidventuri commented 4 years ago

@hugobowne I timeboxed about 20 minutes to solve the Coiled and Dask logos side-by-side problem and this is the solution I've come up with.

<p float="center">
  <img src="images/horizontal.png" alt="Coiled logo" width="415" hspace="10"/>
  <img src="images/dask_horizontal_no_pad.svg" alt="Dask logo" width="200" hspace="10" />
</p>

In JupyterLab, this code displays the images side by side by not centered:

And in GitHub preview, this code displays the images on top of each other centered:

I'm not fully certain why things are showing up differently. I'm going to move on to finishing the notebooks now -- let me know if you'd like me to revisit this issue.

davidventuri commented 4 years ago

@hugobowne Is this wording accurate?

davidventuri commented 4 years ago

@hugobowne Done the first four highest-priority notebooks as discussed yesterday. I believe I've checked everything requested in this issue, but let me know if I missed anything big.

PR: https://github.com/coiled/data-science-at-scale/pull/20

hugobowne commented 4 years ago

thanks for all of this, Dave!

wrt

Is this wording accurate?

ill confirm with @jrbourbeau tomorrow when I check in with him but we're looking good here so I'll close the issue.

coiled / data-science-at-scale

Make NBs production (training!) ready #16