coiled / data-science-at-scale

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
MIT License
112 stars 38 forks source link

Make NBs production (training!) ready #16

Closed hugobowne closed 4 years ago

hugobowne commented 4 years ago

I've got the 1st NB for this tutorial at a place I'm happy with (final Coiled error aside): https://github.com/coiled/data-science-at-scale/blob/master/01-data-analysis-at-scale.ipynb

As discussed, @davidventuri, if you could use this as inspiration for filling out text, code comments (and images, if you see fit), in the other NBs, that would be great.

The remaining NBs I would like you to prioritize in the following order:

A few things I've done here that will be needed in the other NBs:

Don't edit any code but do raise issues if you think there's something funky.

NBs 3a, 3b, and 4a are from here and need to be credited as such. I think you'll likely edit much of the text of them so feel free to add in them something like "This material riffs of ..."

Two more things:

Feel free to use anything I've written here for this.

It may be clear, but NB4b will likely be the only ML NB and & I may drop 4a.

davidventuri commented 4 years ago
davidventuri commented 4 years ago

@hugobowne I timeboxed about 20 minutes to solve the Coiled and Dask logos side-by-side problem and this is the solution I've come up with.

<p float="center">
  <img src="images/horizontal.png" alt="Coiled logo" width="415" hspace="10"/>
  <img src="images/dask_horizontal_no_pad.svg" alt="Dask logo" width="200" hspace="10" />
</p>

In JupyterLab, this code displays the images side by side by not centered:

Screen Shot 2020-09-13 at 1 28 37 PM

And in GitHub preview, this code displays the images on top of each other centered:

Screen Shot 2020-09-13 at 1 29 23 PM

I'm not fully certain why things are showing up differently. I'm going to move on to finishing the notebooks now -- let me know if you'd like me to revisit this issue.

davidventuri commented 4 years ago

@hugobowne Is this wording accurate?

Screen Shot 2020-09-13 at 3 19 12 PM
davidventuri commented 4 years ago

@hugobowne Done the first four highest-priority notebooks as discussed yesterday. I believe I've checked everything requested in this issue, but let me know if I missed anything big.

PR: https://github.com/coiled/data-science-at-scale/pull/20

hugobowne commented 4 years ago

thanks for all of this, Dave!

wrt

Is this wording accurate?

ill confirm with @jrbourbeau tomorrow when I check in with him but we're looking good here so I'll close the issue.