ProjectPythia / pythia-foundations

Jupyterbook source for the Foundations collection
http://foundations.projectpythia.org
Apache License 2.0
59 stars 42 forks source link

Add xarray dask content #321

Closed mgrover1 closed 1 year ago

mgrover1 commented 1 year ago

Closes #108

This adds content related to xarray and Dask. Instead of digging deep into a dedicated dask section, we add a bit on parallel and distributed computing with Dask + Xarray here.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 1 year ago

This pull request is being automatically built with GitHub Actions and Netlify. To see the status of your deployment, click below.

🔍 Git commit SHA: 1123c4afc1f2d0fffb9249108d644b42ac5ad0c3 ✅ Deployment Preview URL: https://637d28bad6baf6241278f38a--pythia-foundations.netlify.app

jukent commented 1 year ago

Hey @mgrover1 Thanks for putting this together. Can you add this file to the toc so it appears in the navbar?

mgrover1 commented 1 year ago

Hey @mgrover1 Thanks for putting this together. Can you add this file to the toc so it appears in the navbar?

Added that in! Thanks for the reminder!

jmunroe commented 1 year ago

I really like that this description of Dask is tightly scoped to just introducing Dask Arrays. A lot of other dask tutorials lead someone down the path of "futures" and other much more advanced topics. A couple of brief thoughts:

mgrover1 commented 1 year ago

Thanks @jmunroe for the feedback!! I appreciate it - I will be sure to incorporate those suggestions.

jukent commented 1 year ago

@mgrover1 Could you ping me to take a look after you've addressed @jmunroe 's comments?

jnmorley commented 1 year ago

@mgrover1 Thanks for putting this together. I think the content fits well in the context of the foundations book as it covers a lot of use case without overloading the reader with a bunch of details. Here are some things I notices as I read through the notebook.

For example, consider taking the sum of a billion numbers. We might instead break up the array into 1,000 chunks, each of size 1,000,000, take the sum of each chunk, and then take the sum of the intermediate sums.

We also saw that we can use Xarray to access dask.arrays, which required passing a chunks argument to upon opening the dataset. Once the data were loaded into Xarray, we can could interact with the xarray.Datasets and xarray.DataArrays as we would if we [were] working with dask.arrays. This can be a powerful tool when working with these datasets!

I hope these suggestions help and are not to nit picky.

mgrover1 commented 1 year ago

@jukent I responded to @jmunroe and @jnmorley 's comments here, it is ready for another review

mgrover1 commented 1 year ago

@brian-rose - would you be willing to take a look at this content? We need one more review :)

mgrover1 commented 1 year ago

@brian-rose - this is fixed in the latest build https://636ab705260c5c09049be5d4--pythia-foundations.netlify.app/core/xarray/dask-arrays-xarray.html

mgrover1 commented 1 year ago

Thanks for the feedback @brian-rose !! I appreciate it! I think we can cover open_mfdataset in a followup PR

brian-rose commented 1 year ago

@mgrover1 just checking in this. We identified this PR as "low hanging fruit" for the content push prior to the AGU meeting (https://discourse.pangeo.io/t/project-pythia-content-priorities-before-agu-ams/2914).

Do you expect to be able to incorporate the last round of edits sometime in the next two weeks?

mgrover1 commented 1 year ago

@brian-rose sorry for the delay - it is ready for another review. I made those changes you requested.

brian-rose commented 1 year ago

@mgrover1 almost there! I spotted just a couple of formatting fixes still unresolved. Probably easier if I just fix them now and push onto this branch. Then I think we're done with this.

mgrover1 commented 1 year ago

@mgrover1 almost there! I spotted just a couple of formatting fixes still unresolved. Probably easier if I just fix them now and push onto this branch. Then I think we're done with this.

Sounds like a plan! Thank you for your feedback here!

brian-rose commented 1 year ago

Link checker failure is unrelated, seems to be a service outage of some kind. I think this is good to go.

mgrover1 commented 1 year ago

@brian-rose feel free to merge when ready :)