WG5 UseCase ERA5Land and xesmf error

Cri-dei commented 1 year ago

Hi @malmans2,

I am Cristina Deidda I am part of WG5 and developing a Use Case code for Era5Land. I managed to enter in the Virtual Machine and install the packages.

If I just run:

import warnings import xarray as xr from c3s_eqc_automatic_quality_control import diagnostics, download warnings.filterwarnings("ignore")

I have this error: ModuleNotFoundError: No module named 'xesmf'

I tried to install it but seems there are dependency issue with the package esmpy.

Can I have help for solving this issue? Moreover, I started writing the code for a UseCase for Thier3b. I attached here the Jupiter notebook script. I would like to have some feedback about the code, if in this way can be okay for the WG5 delivery.

Usecase_Summer Temperature trend in East Europe_v1.zip

It still a draft version but I would like to know if I am going in the right direction.

Thank you, Cristina

malmans2 commented 1 year ago

Hi @Cri-dei ,

I tried to install it but seems there are dependency issue with the package esmpy.

I think you installed an old environment. The instructions to install the latest environment are in the README of this repo. I suggest to re-create your environment from scratch. Could you please try this:

git clone https://github.com/bopen/c3s-eqc-toolbox-template.git
cd c3s-eqc-toolbox-template
conda create -n deidda -c conda-forge python=3.10 ipython
conda activate deidda
make conda-env-update

Let me know if that works OK.

I would like to have some feedback about the code, if in this way can be okay for the WG5 delivery.

I'll take a look in the next couple of days!

Cri-dei commented 1 year ago

@malmans2 thank you!

malmans2 commented 1 year ago

FYI: I edited my previous comment because I copy-pasted the wrong command. The last command to update the environment is: make conda-env-update

malmans2 commented 1 year ago

Hi @Cri-dei, Looks like you don't make use of c3s_eqc_automatic_quality_control, and you are analysing data that has been previously downloaded.

To begin with, instead of loading pre-existing files, you should use c3s_eqc_automatic_quality_control.download on top of your notebook. You'll see that the software that we are developing caches all downloaded files.

All the template notebooks make use of download.download_and_transform, so you should be able to figure out how to do the same for your dataset. You can also check the documentation with help(download.download_and_transform). But if you have any question, feel free to ask.

Cri-dei commented 1 year ago

Hi @malmans2 , Yes I know it, it is also written it in the first comment of the code. I did not put that line of command since I had the bug that I reported here and I can not directly download the data. So the definitive code will have that part at the start to automatically download the data. I would like to know the feedback in general on the structure of the code if can be okay for the UseCase delivery. I will update the code with that anyway then share it again. Thanks.

malmans2 commented 1 year ago

Got it. Once you revise the code using data directly downloaded from the cds, I can help improving it. It's already in good shape to me, we can probably just make use of other diagnostics and plotting functions we already implemented. We can also add new diagnostics specific for your case, and we will check whether the code works ok with large datasets.

I can't really comment on the content as we are just providing technical support. If you haven't done it yet, you should check with Chunxue whether the output is il line with the UseCase delivery.

Cri-dei commented 1 year ago

Hi @malmans2 ,

Thanks for the work that you and Vincenzo did, I am now able to run Jupyter and the main packages without error. I was just do some trials with a code that you provide in the Notebook. I can do and run the request with no errors. Once I compute: ds_mean = download.download_and_transform( collection_id, requests, transform_func=diagnostics.spatial_weighted_mean, chunks={"year": 1, "month": 1}, )

I have several errors: Missing/incomplete configuration file: /data/wp5/.cdsapirc, I attached the figures here.

Can you help me?

Thank you, Cristina

malmans2 commented 1 year ago

Hi @Cri-dei, Each WP should add a .cdsapirc which contains the credentials to access CDS data. In your case, the file should go here: /data/wp5/.cdsapirc

If you are not familiar with .cdsapirc, check out this page: https://cds.climate.copernicus.eu/api-how-to

@vincenzodetoma wp5 is the only WP missing the cdsapirc, can you add yours?

vincenzodetoma commented 1 year ago

Hi @malmans2, @Cri-dei, Ok I will try. However, in this way I will have to accept license agreement if necessary before accessing the data. I think I've already done this for ERA5, but in the future we'll have probably to find a better solution, for example defining url and keys for each user directly within the jupyter-notebook.

vincenzodetoma commented 1 year ago

@Cri-dei Try now, it should work fine

malmans2 commented 1 year ago

If anyone needs to use their own cdsapirc, they can do this on top of their notebooks:

import os
os.environ["CDSAPI_RC"] = os.path.expanduser("~/lastname_firstname/.cdsapirc")

I will add it in the overview notebook.

Cri-dei commented 1 year ago

Good morning, Now I have a new error/warning. It was running for hours without finishing.

malmans2 commented 1 year ago

These are connections issues with the VM and/or the CDS (see also: https://github.com/bopen/c3s-eqc-toolbox-template/issues/14).

Try to restart the kernel and re-run the notebook! Most of the files should be in the cache, so hopefully you'll be able to compute ds_mean very quickly.

Cri-dei commented 1 year ago

Hi @malmans2 ,

I want to download satellite water level for lake Victoria. https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-lake-water-level?tab=form It is correct this code?

And then after that I wrote the request what is the command to just download the data? Without also doing the mean or other processing?

Thanks

malmans2 commented 1 year ago

Hi, The update_request_data is only needed for datasets that are updated very frequently. In your case the dataset doesn't even have a time parameter.

You can just do this:

ds = download.download_and_transform(collection_id, request)

Cri-dei commented 1 year ago

I attached the code, can you please look at it and leave me the comments? I would like to download the entire data series, thanks.

Lake water level(1).zip

malmans2 commented 1 year ago

The parameter variable must be a string, whereas you are using python's builtin function all. Try using this request:

request = {
    "lake": "victoria",
    "variable": "all",
}

Cri-dei commented 1 year ago

Okay now it works, but:

malmans2 commented 1 year ago

You are hitting the issue @vincenzodetoma raised the other day.

If anyone needs to use their own cdsapirc, they can do this on top of their notebooks:
import os
os.environ["CDSAPI_RC"] = os.path.expanduser("~/lastname_firstname/.cdsapirc")
I will add it in the overview notebook.

Either ask Vincenzo to accept the terms and conditions, or use your own cdsapirc as explained above.

Cri-dei commented 1 year ago

Hi @malmans2 and @vincenzodetoma ,

I managed to fix the issue of cdsapirc, now is working. Actually now I am plotting a global map of lake water temperature dataset, but I have a memory error:

This is the request I did, but seems working:

malmans2 commented 1 year ago

Hi!

Can you please send me the whole code to reproduce the error please? Also, it would be better to share the code rather than screenshots, so I can copy and paste.

If you wrap your code in three back quotes and you specify python, GH will automatically do syntax highlighting. See: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#syntax-highlighting

Thanks!

Cri-dei commented 1 year ago

Okay, I prefer send you the entire code so you can see it whole. Thanks

Lake water temperature.zip

malmans2 commented 1 year ago

Hi @Cri-dei,

I see, there's a bug. I need to find a better workaround to handle satellites. The problem is that in some cases we need to infer the time dimension from the filenames, and in other cases (like yours) we don't. I'll let you know when it's fixed!

malmans2 commented 1 year ago

OK, should be fixed now. I already updated the wp5 environment.

Try again and let me know. Keep in mind that the source dimension does not exist anymore for your dataset. That was a bug in your case, it's only needed for other datasets.

Cri-dei commented 1 year ago

Hi @malmans2 ,

Now is working thank you!

I need now to install some packages:

      9 import geopandas
---> 10 import rioxarray
     11 import xarray
     12 from shapely.geometry import mapping

ModuleNotFoundError: No module named 'rioxarray'

These are that one that i need, don`t know that just rioxarray is not available or also some of the others. Thanks

malmans2 commented 1 year ago

Great!

All those packages are now available for wp5 on the VM. Restart the kernel and try again.

Cri-dei commented 1 year ago

Works thanks!

It is taking very long time to download the data for all the available years. It is not even showing the percentage bar. Can I ask you if the request is correct?

Thank you so much.

Lake water temperature (2).zip

malmans2 commented 1 year ago

Probably the request is slow because it's quite big. You should chunk your requests. For example, try:

ds = download.download_and_transform(collection_id, request, chunks={"year": 1})

Cri-dei commented 1 year ago

I am stucked. I reduced the number of years and I selected a smaller area but still I am not able to download the data.

request = {
    'version': '4.0',
        'year': [
            #'1995', '1996', '1997',
            '1998', '1999', '2000',
            #'2001', '2002', '2003',
           # '2004', '2005', '2006',
            #'2007', '2008', '2009',
           # '2010', '2011', '2012',
            #'2013', '2014', '2015',
            #'2016', '2017', '2018',
            #'2019',
        ],
        'month': [
            '01', '02', '03',
           '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'variable': 'all',
        'area': [28, 41, -16,
            4,
        ]
    }

start = "2022-01"
stop = None```

``` requests = download.update_request_date(
    request, start=start, stop=stop)

ds = download.download_and_transform(collection_id, request, chunks={"year": 1})```

The error is:
```WARNING  Recovering from connection error [HTTPSConnectionPool(host='download-0008-clone.copernicus-climate.eu',   
         port=443): Max retries exceeded with url:                                                                 
         /cache-compute-0008/cache/data3/dataset-satellite-lake-water-temperature-38ac32be-a63f-4e0e-a758-c6bf4d073
         e33.zip (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe5afb6aef0>,     
         'Connection to download-0008-clone.copernicus-climate.eu timed out. (connect timeout=60)'))], attemps 0 of
         500                                                                                                       

WARNING  Retrying in 120 seconds                                                                                   

WARNING  Recovering from connection error [HTTPSConnectionPool(host='download-0008-clone.copernicus-climate.eu',   
         port=443): Max retries exceeded with url:                                                                 
         /cache-compute-0008/cache/data3/dataset-satellite-lake-water-temperature-38ac32be-a63f-4e0e-a758-c6bf4d073
         e33.zip (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe5afb6a230>,     
         'Connection to download-0008-clone.copernicus-climate.eu timed out. (connect timeout=60)'))], attemps 1 of
         500                                                                                                       

WARNING  Retrying in 120 seconds

Capture222

malmans2 commented 1 year ago

I think you are experiencing this: https://github.com/bopen/c3s-eqc-toolbox-template/issues/14 B-Open does not maintain cdsapi nor the VM you are using, so unfortunately there's not much I can do about these connection errors.

Try to install the environment on your local machine and run locally.

Cri-dei commented 1 year ago

If I use the function to directly download the monthly mean it can be more fast? Can I ask you the command to directly download the monthly mean of the variable?

requests = download.update_request_date(
    request, start=start, stop=stop)

ds = download.download_and_transform(collection_id, request, chunks={"year": 1})

malmans2 commented 1 year ago

What is the collection_id of your dataset?

Cri-dei commented 1 year ago

"satellite-lake-water-temperature"

malmans2 commented 1 year ago

This is the cds form: https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-lake-water-temperature?tab=form

There's no monthly option, so you can not download monthly data directly. If your chunks are big (I don't know the size of your dataset) you could try to chunk even more (month, day, ...).

Cri-dei commented 1 year ago

Sorry I was not clear, I meant that I would like to use the package and directly download the transformed data in monthly mean. If in this command I want to transform directly the data in monthly mean:

ds = download.download_and_transform(collection_id, request, chunks={"year": 1})

I have to create a function? I saw in the documentation there is the spatial mean example.


    ds = diagnostics.spatial_weighted_mean(ds)
    return ds.resample(forecast_reference_time="1D").mean("forecast_reference_time")```

I would like to do just monthly mean.
Can I ask you the code for doing it?

Thank you so much.

malmans2 commented 1 year ago

I think you can just replace "1D" with "1M" and use the dimension name for your dataset. But at the moment it looks like you are having problems downloading the data. The transform function is applied after downloading.

Cri-dei commented 1 year ago

Hello @malmans2 , There have been some changes? I am working today on the Jupyter Notebook again and I have errors from the starts with the packages and on the download command that till today was working without problems. ( I have to say that is very difficult to work like this).

ModuleNotFoundError: No module named 'ESMF'

Lake water temperature-Fixed.zip

malmans2 commented 1 year ago

Hey @Cri-dei, I think I found the issue. I had to pin xesmf on the VM environments.

Please try again and let me know if anything is broken.

I understand the frustration, but unfortunately the workflow on the VM is not ideal for developing, especially because at the moment all packages are in alpha stage. We are working with what we have, which is a VM with a single user. We need to update often the environments so we are in sync with our changes and the changes upstream by ECMWF.

If you don't really need a shared cache (e.g., other people are not processing the same data you are using), you can always develop on your local machine. Then you can move to the VM once you are done developing and you need to run large computations.

Cri-dei commented 1 year ago

Okay now it is working again. I need this package: import pymannkendall as mk. Thanks.

malmans2 commented 1 year ago

Done!

malmans2 commented 1 year ago

Hi there, If it's of use, there's a new notebook template under the satellite folder.

Cri-dei commented 1 year ago

Hi @malmans2 ,

I attached here the final version of the code for two dataset: lakes water level, lakes surface water temperature and soil moisture.

For the code for the lakes surface water temperature I have some problems since I have a lot of warnings and it is not able to compute the trend. The code for water level should be okay, that one for soil moisture can be optimized. I do the same plot for three different continents.

These are supposed to be the final code for the deadline of March, I still have to do other 2 dataset that I hope I can send you tomorrow.

Can you please start checking these one? Thanks

WG5_Jupyter Notebook.zip

malmans2 commented 1 year ago

OK, I'll take a look! I've seen that error recently in a xarray issue, it looks like it's actually a bug in the latest netcdf-c. We might have to pin libnetcdf, but I need to explore.

Cri-dei commented 1 year ago

These are the file in jupyter notebook format, better you check this. In the other comment was .py!

Thanks!

0_Jupyter Notebook.zip

malmans2 commented 1 year ago

Hi @Cri-dei , I looked at the first one and added a template here: https://github.com/bopen/c3s-eqc-toolbox-template/tree/main/notebooks/wp5 I'll work on the other notebooks tomorrow.

malmans2 commented 1 year ago

Hi @Cri-dei, Are the shapefiles that you are using in your notebook publicly available? Could you please send me the link, so I can add a retrieve function and anyone can run the notebook? Thanks.

Cri-dei commented 1 year ago

Hi! I retrieved from here: https://figshare.com/articles/dataset/Continent_Polygons/12555170/3

And are in my folder in the VM. Soil_moisture/ Continents. http://localhost:5678/tree/Soil%20moisture/Continents

continent-poly.zip

Cri-dei commented 1 year ago

Hi @malmans2 ,

I am trying now to download the data of my last dataset: satellite-lai-fapar". But there are errors. I am doing something wrong in the request? Thanks

collection_id = "satellite-lai-fapar"
request = {
'variable': [
            'fapar', 'lai',
        ],
        'satellite': 'proba',
        'sensor': 'vgt',
        'horizontal_resolution': '1km',
        'product_version': 'V2',
        'year': '2014',
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'nominal_day': [
            '03', '13', '21',
            '23', '24',
        ],
        'format': 'zip',
        'area': [90, -180, -90,
            180,
        ],
    }

start = "2022-01"
stop = None

malmans2 commented 1 year ago

Weird, looks like your env does not know that you are a wp5 user.

Maybe you just have to restart the server. Can you try to stop the jupyter server (follow the instructions printed by jupyter_server, the command is basically jupyter notebook stop $YOUR_PORT_NUMBER), and open a new session (rerun jupyter_server and folow the instructions).

malmans2 commented 1 year ago

Hi @Cri-dei , I've added the soil moisture notebook as well: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/soil_moisture_dataquality.ipynb

bopen / c3s-eqc-toolbox-template

WG5 UseCase ERA5Land and xesmf error #19