carpentries-lab / python-aos-lesson

Python for Atmosphere and Ocean Scientists
https://carpentries-lab.github.io/python-aos-lesson/
Other
87 stars 49 forks source link

Update PyAOS lessons to use CMIP6 data instead of CMIP5 #32

Closed hot007 closed 3 years ago

hot007 commented 3 years ago

Updated lessons to use CMIP6 data instead of CMIP5. Included CMIP6 data files in /data, and removed the CMIP5 data. Have not updated any screenshots, only output plots. Added Episode 10 to work with dask, but it needs a lot of reviewing. Please test all changes, I have tested locally but I may have made copy-paste mistakes. Thanks! -Claire

hot007 commented 3 years ago

New notebook looks much better, thanks! I'm a bit worried that the parallel version came out with all NaNs, that shouldn't have happened! Does .max sometimes suffer from the same NaN problem as .mean and need to use something like np.nanmean? I'll try to revisit this week but have a few other things to do first. I assume the initial serial calculation (~1hr) would NOT be run by students in class?! I like the idea of adding a bit on dask aware functions, though not sure if it'll fit in time-wise. Worth at least mentioning though I think.

DamienIrving commented 3 years ago

Yeah, the students wouldn't be running the serial calculation. In fact, for all of these lessons (besides the version control ones) the idea is that the students just watch as the instructor live codes what's in the notes, then the students do whatever is in the exercises. In this case the instructor wouldn't actually execute the serial command, they'd just say it takes an hour and move on (something else for the instructor notes document).

I agree that at the very least we'd have an information box introducing the idea of creating your own dask aware functions, with a link to a good tutorial on the topic. We can probably then just create an "enhancement" issue in the repo flagging that we could replace that information box with an actual dask aware function that we build from scratch sometime in future.

It would be good to have this PR merged by 9am Monday at the latest, because that's when ticket "sales" for the workshop open (we are sending an email to the 130+ conference attendees who expressed an interest in attending the workshop this week telling them that 35 places are available, so please make sure you really need to come). Once people register they'll get the data download instructions, and this PR has the new data files in it. https://www.eventbrite.com.au/e/python-for-atmosphere-and-ocean-science-tickets-137505758425

So I guess for now we just need to:

Exercise ideas:

hot007 commented 3 years ago

I've just added a dask daskboard screenshot to the 'fig' directory, but haven't changed the lesson to incorporate it, just letting you know it's there if you want to put it in your notebook when you're preparing it @DamienIrving .

hot007 commented 3 years ago

Oh, it’s completely fine, my plot didn’t work because we didn’t remap to a Cartesian grid so it’s still tripolar, just doing imshow indicates the data is fine. So if you want to deal with remapping, we can add that in and make a nice plot, or we can just do imshow to demonstrate we did the thing.

image

So I guess I’ve at least partially solved the NaN problem and maybe having it on a tripolar grid leads to a good exercise for the students to map it back to Cartesian and plot as per 02-visualisation.

DamienIrving commented 3 years ago

I like the idea of having the students do the remapping as an exercise. We can have a "hint" in the exercise pointing to some online documentation about the relevant remapping function. (They can read about the remapping while they wait the 20 minutes for the parallel tos max calculation to happen!)

hot007 commented 3 years ago

I'll admit I ran it on the VDI to check (but still using THREDDS) and it ran in under 4min!! I assume I messed up anyone else on the node at that time, but it was so much quicker than my laptop!