Add intake description - Githubissues

Ovewh commented 5 months ago

Added notebook example on how to use intake esm and intake catalog to browse available data.

I used the one called CMIP.json which only contain local CMIP6 data.

mvdebolskiy commented 5 months ago

Thanks for doing this. Updated with other data description and removed the old file. Do you think we should add an overview into the index.rst for data section?

sarambl commented 5 months ago

This is awsome! I wonder though if it could be simplified? Like, do they need to pre-process? And do they need the dask?

sarambl commented 5 months ago

Secondly, I'm getting a lot of errors when I use this catalog that I didn't get with the pangeo catalog. Any ideas why? Right now it's giving me a lot of

but I also e.g. got this:

sarambl commented 5 months ago

I can obviously upload the example if helpful.

mvdebolskiy commented 5 months ago

Are you opening the whole catalog at once without .search()?

Ovewh commented 5 months ago

Like, do they need to pre-process?

I think it nice to see that there is an option for preprocessing the data. That can save you some work that would have to be done otherwise, by loops etc.

And do they need the dask?

Dask is also optional, but can make the calculation faster.

sarambl commented 5 months ago

I agree it's very nice, but most of the students might want to just copy paste that code into their own notebook and tweak the search. That's why I would just do a super easy "read and plot" first and then add complexity after. Does it make sense?

sarambl commented 5 months ago

Are you opening the whole catalog at once without .search()?

I do this: https://github.com/MetOs-UiO/eScience2024/blob/edit_text/docs/learning/notebooks/some-xarray-pandas-presentation_Sara.ipynb

Ovewh commented 5 months ago

@mvdebolskiy BTW could you make a seperate folder for catalogs? Under /mnt/craas1-ns9989k-geo4992/data/catalogs

could you also copy over some other catalogs too. I have build one for the CESM-PPE and also one which merges pangeo and the local cmip6 catalog.

mvdebolskiy commented 5 months ago

@Ovewh Sure, I can make one. Can you put all of them in your $HOME/catalogs and ping me?

sarambl commented 5 months ago

Ok, so my error came because it is trying to merge from AERmon and Amon (or possibly the difference in vertical coordinate.

We tested a bit with @mvdebolskiy and if we specify:

cat.esmcat.aggregation_control.groupby_attrs = ['activity_id', 'experiment_id','source_id','table_id']

it works fine. Might need more separators as well? I am not sure. Also, the example you give Ove seems to separate for only activity_id and institution_id, which is maybe not ideal (in case you open different experiments in one e.g.).

Ovewh commented 5 months ago

@sarambl Ok, I'll add the same groupby attrs as pangeo uses

Ovewh commented 5 months ago

@Ovewh Sure, I can make one. Can you put all of them in your $HOME/catalogs and ping me?

I have put all the catalogs under fc-3auid-3a9fdc0c87-2d7836-2d4bdc-2db802-2d9a250c322e3b/catalogs @mvdebolskiy

mvdebolskiy commented 5 months ago

@Ovewh, awesome. Will put them in data. Btw, change the dask tooltip, since it's easier to just click on the left sidebar in the jupyterlab.

Ovewh commented 5 months ago

I think it looks good for now, so I'll merge it into main

MetOs-UiO / eScience2024

Add intake description #22