Pangeo training material for Big data geosciences

annefou commented 2 years ago

We have:

pangeo: Pangeo ecosystem 101 for everyone - Introduction to Xarray Galaxy Tools
pangeo-notebook: Pangeo Notebook in Galaxy - Introduction to Xarray

The first one (pangeo 101) is meant to be used by anyone and does not require any programming skills (using Galaxy Tools) and shows what is Pangeo and its community and how to use Xarray tools in Galaxy.

The second one (pangeo notebook) makes use of Pangeo JupyterLab interactive tool and is an introduction to Xarray for those who have basic Python programming skills.

yvanlebras commented 2 years ago

Amazing! Thank you Anne! Really top! I made a first rapid review and PR to nordicESMhub repo bit I think I made something wrong like 1 pr on main branch and another on the good pangeo one.... Don't hesitate if you have doubts on how to manage it ;)

hexylena commented 2 years ago

The second one (pangeo notebook) makes use of Pangeo JupyterLab interactive tool and is an introduction to Xarray for those who have basic Python programming skills.

Fyi @annefou there is a new format you can opt-in to using, that generates the ipynb files automatically. You can see it in action here: https://training.galaxyproject.org/training-material/topics/data-science/ anything tagged jupyter-notebook and rmarkdown-notebook have these files automatically generated from their GTN content, if that's interesting to you

annefou commented 2 years ago

Fyi @annefou there is a new format you can opt-in to using, that generates the ipynb files automatically.

Wow!!! This is so cool!!! I was looking for something like that!!! I definitely want it. Thank you so much!

hexylena commented 2 years ago

Oh, I even wrote documentation! https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-tutorial-content/tutorial.html#automatic-jupyter-notebooks

Do not use the built in citation system

is also outdated, citations work now.

annefou commented 2 years ago

presenter notes have been added for both Pangeo tutorials.

hexylena commented 2 years ago

@annefou I can't seem to push to your branch, can you please make a commit like:

From 318ef39adac638acf0980ce350c0a0e7ecd36c03 Mon Sep 17 00:00:00 2001
From: Helena Rasche <hxr@hx42.org>
Date: Tue, 8 Feb 2022 10:55:57 +0100
Subject: [PATCH] fix missing citations section

---
 _layouts/slides-plain.html | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/_layouts/slides-plain.html b/_layouts/slides-plain.html
index 52015e3b01..8411f2dca8 100644
--- a/_layouts/slides-plain.html
+++ b/_layouts/slides-plain.html
@@ -86,3 +86,8 @@ This material is the result of a collaborative work. Thanks to the <a href="http

 This material is licensed under the <a rel="license" href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
 </section>
+
+{% if page.cited %}
+<h2 id="bibliography">{{locale['references']| default: "References" }}</h2>
+{% bibliography --cited %}
+{% endif %}
-- 
2.25.1

adding those lines to the bottom of the _layouts/slides-plain.html. I suspect that fixes the build errors.

hexylena commented 2 years ago

I've written https://github.com/galaxyproject/training-material/pull/3161 in support of this, it'll add support for the citations working properly in the automatic videos. Otherwise, fantastic! From the technical side it looks ok.

Maintainers: please merge #3161 before this.

annefou commented 2 years ago

Awesome! So to summarize: do I need to add what you suggested earlier or should I wait for your PR to be merged and then update my branch?

Many thanks!

hexylena commented 2 years ago

You need to add what I suggested in https://github.com/galaxyproject/training-material/pull/3147#issuecomment-1032422137, that will let this PR pass.

After that, one of the other maintainers will need to merge my PR, that will let your videos get built correctly when they do get built

And last we'll merge your PR

hexylena commented 2 years ago

Normally I'd make all these commits directly to your branch but for whatever reason I couldn't push directly.

annefou commented 2 years ago

Normally I'd make all these commits directly to your branch but for whatever reason I couldn't push directly.

I am not sure why you can't. I usually allow maintainers to update and I don't see the "box" to tick/untick now.

hexylena commented 2 years ago

Ok, since you've fixed the auth, I'll just add all of those commits into this PR and it'll be much simpler! Thanks @annefou :)

annefou commented 2 years ago

Thank you so much @hexylena !!!!

hexylena commented 2 years ago

failing now due to a dependency missing in the video dry run, I'll look into it.

annefou commented 2 years ago

Wow! This is green!!! Thank you!

hexylena commented 2 years ago

I think we'll want to make changes to the intro section. Right now it still says "best viewed in JupyterLab" but that's not appropriate here, and the intro section has some potential issues like the tool link box breaking the notebook currently.

Currently there is a snippet that is auto-included if it's a notebook, defaulting to jupyter.

I think the best approach is that I move the preamble into a snippet just for this tutorial, and then I update the auto-include jupyter tip box, to instead have your setup instructions.

Then they'll see at the top something like:

Tip: best viewed in Pangeo Jupyterlab
  create new history
  upload data
  starting pangeo jupyter

but that'll take a bit of work. I think it's useful for the long run to support the other jupyter flavours though.

In the meantime Climate folks can start reviewing :)

hexylena commented 2 years ago

Ok, I got that working. I think from a technical standpoint, it's ready to go. The setup instructions (history, run this tool, etc) are moved into a snippet (preamble.md) and I've included the final instructions to download the GTN notebook into the Pangeo + start it. (We cannot just load the notebook into Galaxy, the html of the GTN notebooks is unacceptable, sadly. So we...work around it by starting the notebook, and then downloading the ipynb + switching to it.)

annefou commented 2 years ago

Fine from a technical standpoint. If anyone is up to review content/language it would be good :)

Is there anything we can do on our side (pangeo & Climate) to help?

yvanlebras commented 2 years ago

Fine from a technical standpoint. If anyone is up to review content/language it would be good :)

Is there anything we can do on our side (pangeo & Climate) to help?

I (not me personally ;) ) can try something!

hexylena commented 2 years ago

Yeah, if anyone just wants to follow the instructions and say if it works / if it's appropriate for climate scientists/etc :)

gallardoalba commented 2 years ago

The comments included in code blocks can't be read easily; I'm not sure if it would be possible to modify the color palette @hexylena?

Screenshot from 2022-02-16 22-21-05

hexylena commented 2 years ago

It's a known issue @gallardoalba , we should definitely fix it, there's some other colour schemes we can try (but in this tutorial the majority of people will not be reading the gtn material, and will instead be using the notebook.)

hexylena commented 2 years ago

What do you think @gallardoalba

And for admin training where we use diffs very heavily.

gallardoalba commented 2 years ago

Perfect @hexylena!

annefou commented 2 years ago

@gallardoalba Thanks a lot for your very careful review. This is really awesome! I have committed most of your changes and I will make the remaining requested changes/improvements.

annefou commented 2 years ago

thank you @yvanlebras and Solenne! I'll update the few pending comments (from previous review too). Many thanks for reviewing this material!

annefou commented 2 years ago

Ok. So I think I took into account all your comments. Thanks a lot for your review!

yvanlebras commented 2 years ago

Amazing! I will try to test this final version now and validate it! If you think @annefou you can test mine ;) https://github.com/galaxyproject/training-material/pull/3152 this can be amazing !!!! Have a nice week-end!

annefou commented 2 years ago

Amazing! I will try to test this final version now and validate it! If you think @annefou you can test mine ;) #3152 this can be amazing !!!! Have a nice week-end!

Cool. Yes I can review your training material! Thanks.

yvanlebras commented 2 years ago

Really sorry... now I have the dataset, I have an error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/jwd/main/041/778/41778869/tmp/ipykernel_354/4122857281.py in <module>
----> 1 dset = xr.open_dataset("CAMS-PM2_5-20211222.netcdf")

/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    477 
    478     if engine is None:
--> 479         engine = plugins.guess_engine(filename_or_obj)
    480 
    481     backend = plugins.get_backend(engine)

/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    150         )
    151 
--> 152     raise ValueError(error_msg)
    153 
    154 

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib', 'pydap', 'rasterio', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
http://xarray.pydata.org/en/stable/getting-started-guide/installing.html
http://xarray.pydata.org/en/stable/user-guide/io.html

typing this dset = xr.open_dataset("CAMS-PM2_5-20211222.netcdf") ...

yvanlebras commented 2 years ago

my jupyter notebook FYI https://3525516ba8d111f5-3742e8a717c04bddb0a50d763b550537.interactivetoolentrypoint.interactivetool.ecology.usegalaxy.eu/ipython/lab/tree/Untitled.ipynb

annefou commented 2 years ago

my jupyter notebook FYI https://3525516ba8d111f5-3742e8a717c04bddb0a50d763b550537.interactivetoolentrypoint.interactivetool.ecology.usegalaxy.eu/ipython/lab/tree/Untitled.ipynb

I am not sure why. It usually happens when the type of the file is not set to netcdf but h5. Actually it did not find the file. The error is a bit misleading... Your file is in the data folder:

dset = xr.open_dataset("data/CAMS-PM2_5-20211222.netcdf")

yvanlebras commented 2 years ago

ok, I retest from start and it is ok now! I can go further! THANK YOU !

yvanlebras commented 2 years ago

hey hey! Done!

yvanlebras commented 2 years ago

Amazing tuto! Thank you Anne!!!!!

bgruening commented 2 years ago

What a cool tutorial!

shiltemann commented 2 years ago

whoo!! so awesome! :tada:

(currently there seems to be a problem with rendering the slides video, but we are working on it!)

annefou commented 2 years ago

(currently there seems to be a problem with rendering the slides video, but we are working on it!)

Let me know if there is anything to do on my side.

shiltemann commented 2 years ago

@annefou nah, there was a small bug in the video generation. but the video's are up now :) The pronunciation of "Pangeo" is a bit off tho, so we wil look into teaching it how to pronounce it

annefou commented 2 years ago

Cool! That's awesome!

I find a few "dots" that cut sentences and sometimes it is very odd. I guess sentences were far too long. I have started to note precisely when it happens for the first video. Let me know how I can fix these issues.

in the first pangeo video:

1:55 there is a dot at the end and it should be removed (probably my fault)! must be scalable ... current and future challenges of big data ... e.g. no dot beteen future and challenges.
2:08 we should also remove the dot after use cases e.g. use cases as well as ...
2:20 remove the dot after be e.g. cannot be tackled separately.
2:30 remove dot after define e.g. developers can define priorities for future...
3:22 remove dot after interface e.g. user interface with many functions...
3:54 remove dot after Galaxy e.g. from Galaxy Tools can be useful.

We have similar issues in the second pangeo video (for pangeo-notebook. Also netCDF is not pronounced correctly. I think I should have written net CDF or net-CDF (I forgot about it).

Thanks!

How can I fix these small issues?

hexylena commented 2 years ago

Also netCDF is not pronounced correctly. I think I should have written net CDF or net-CDF (I forgot about it).

You can add these in bin/ari-map.yml, Keep writing netCDF in your slides (better for screen readers/etc), and then the ari-map will map those terms to the way to pronounce them.

hexylena commented 2 years ago

I guess sentences were far too long

Ahh I see what happened, you didn't use bullet points, so they were treated as individual lines. Until now most people have used bullet points or at least had a full sentence on a single line, rather than wrapping which is what's causing the error.

If you rearrange the subtitles so an entire line of text is a single line in the file, this will fix it.

galaxyproject / training-material

Pangeo training material for Big data geosciences #3147