Closed Cri-dei closed 1 year ago
Hi @Cri-dei ,
I tried to install it but seems there are dependency issue with the package esmpy.
I think you installed an old environment. The instructions to install the latest environment are in the README of this repo. I suggest to re-create your environment from scratch. Could you please try this:
git clone https://github.com/bopen/c3s-eqc-toolbox-template.git
cd c3s-eqc-toolbox-template
conda create -n deidda -c conda-forge python=3.10 ipython
conda activate deidda
make conda-env-update
Let me know if that works OK.
I would like to have some feedback about the code, if in this way can be okay for the WG5 delivery.
I'll take a look in the next couple of days!
@malmans2 thank you!
FYI: I edited my previous comment because I copy-pasted the wrong command.
The last command to update the environment is: make conda-env-update
Hi @Cri-dei,
Looks like you don't make use of c3s_eqc_automatic_quality_control
, and you are analysing data that has been previously downloaded.
To begin with, instead of loading pre-existing files, you should use c3s_eqc_automatic_quality_control.download
on top of your notebook. You'll see that the software that we are developing caches all downloaded files.
All the template notebooks make use of download.download_and_transform
, so you should be able to figure out how to do the same for your dataset. You can also check the documentation with help(download.download_and_transform)
. But if you have any question, feel free to ask.
Hi @malmans2 , Yes I know it, it is also written it in the first comment of the code. I did not put that line of command since I had the bug that I reported here and I can not directly download the data. So the definitive code will have that part at the start to automatically download the data. I would like to know the feedback in general on the structure of the code if can be okay for the UseCase delivery. I will update the code with that anyway then share it again. Thanks.
Got it. Once you revise the code using data directly downloaded from the cds, I can help improving it. It's already in good shape to me, we can probably just make use of other diagnostics and plotting functions we already implemented. We can also add new diagnostics specific for your case, and we will check whether the code works ok with large datasets.
I can't really comment on the content as we are just providing technical support. If you haven't done it yet, you should check with Chunxue whether the output is il line with the UseCase delivery.
Hi @malmans2 ,
Thanks for the work that you and Vincenzo did, I am now able to run Jupyter and the main packages without error. I was just do some trials with a code that you provide in the Notebook. I can do and run the request with no errors. Once I compute: ds_mean = download.download_and_transform( collection_id, requests, transform_func=diagnostics.spatial_weighted_mean, chunks={"year": 1, "month": 1}, )
I have several errors: Missing/incomplete configuration file: /data/wp5/.cdsapirc, I attached the figures here.
Can you help me?
Thank you, Cristina
Hi @Cri-dei,
Each WP should add a .cdsapirc
which contains the credentials to access CDS data.
In your case, the file should go here: /data/wp5/.cdsapirc
If you are not familiar with .cdsapirc
, check out this page: https://cds.climate.copernicus.eu/api-how-to
@vincenzodetoma wp5 is the only WP missing the cdsapirc, can you add yours?
Hi @malmans2, @Cri-dei, Ok I will try. However, in this way I will have to accept license agreement if necessary before accessing the data. I think I've already done this for ERA5, but in the future we'll have probably to find a better solution, for example defining url and keys for each user directly within the jupyter-notebook.
@Cri-dei Try now, it should work fine
If anyone needs to use their own cdsapirc, they can do this on top of their notebooks:
import os
os.environ["CDSAPI_RC"] = os.path.expanduser("~/lastname_firstname/.cdsapirc")
I will add it in the overview notebook.
Good morning, Now I have a new error/warning. It was running for hours without finishing.
These are connections issues with the VM and/or the CDS (see also: https://github.com/bopen/c3s-eqc-toolbox-template/issues/14).
Try to restart the kernel and re-run the notebook! Most of the files should be in the cache, so hopefully you'll be able to compute ds_mean
very quickly.
Hi @malmans2 ,
I want to download satellite water level for lake Victoria. https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-lake-water-level?tab=form It is correct this code?
And then after that I wrote the request what is the command to just download the data? Without also doing the mean or other processing?
Thanks
Hi,
The update_request_data
is only needed for datasets that are updated very frequently.
In your case the dataset doesn't even have a time parameter.
You can just do this:
ds = download.download_and_transform(collection_id, request)
I attached the code, can you please look at it and leave me the comments? I would like to download the entire data series, thanks.
The parameter variable
must be a string, whereas you are using python's builtin function all
.
Try using this request:
request = {
"lake": "victoria",
"variable": "all",
}
Okay now it works, but:
You are hitting the issue @vincenzodetoma raised the other day.
If anyone needs to use their own cdsapirc, they can do this on top of their notebooks:
import os os.environ["CDSAPI_RC"] = os.path.expanduser("~/lastname_firstname/.cdsapirc")
I will add it in the overview notebook.
Either ask Vincenzo to accept the terms and conditions, or use your own cdsapirc as explained above.
Hi @malmans2 and @vincenzodetoma ,
I managed to fix the issue of cdsapirc, now is working. Actually now I am plotting a global map of lake water temperature dataset, but I have a memory error:
This is the request I did, but seems working:
Hi!
Can you please send me the whole code to reproduce the error please? Also, it would be better to share the code rather than screenshots, so I can copy and paste.
If you wrap your code in three back quotes and you specify python
, GH will automatically do syntax highlighting.
See: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#syntax-highlighting
Thanks!
Okay, I prefer send you the entire code so you can see it whole. Thanks
Hi @Cri-dei,
I see, there's a bug. I need to find a better workaround to handle satellites. The problem is that in some cases we need to infer the time dimension from the filenames, and in other cases (like yours) we don't. I'll let you know when it's fixed!
OK, should be fixed now. I already updated the wp5 environment.
Try again and let me know. Keep in mind that the source
dimension does not exist anymore for your dataset. That was a bug in your case, it's only needed for other datasets.
Hi @malmans2 ,
Now is working thank you!
I need now to install some packages:
9 import geopandas
---> 10 import rioxarray
11 import xarray
12 from shapely.geometry import mapping
ModuleNotFoundError: No module named 'rioxarray'
These are that one that i need, don`t know that just rioxarray is not available or also some of the others. Thanks
Great!
All those packages are now available for wp5 on the VM. Restart the kernel and try again.
Works thanks!
It is taking very long time to download the data for all the available years. It is not even showing the percentage bar. Can I ask you if the request is correct?
Thank you so much.
Probably the request is slow because it's quite big. You should chunk your requests. For example, try:
ds = download.download_and_transform(collection_id, request, chunks={"year": 1})
I am stucked. I reduced the number of years and I selected a smaller area but still I am not able to download the data.
request = {
'version': '4.0',
'year': [
#'1995', '1996', '1997',
'1998', '1999', '2000',
#'2001', '2002', '2003',
# '2004', '2005', '2006',
#'2007', '2008', '2009',
# '2010', '2011', '2012',
#'2013', '2014', '2015',
#'2016', '2017', '2018',
#'2019',
],
'month': [
'01', '02', '03',
'12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'variable': 'all',
'area': [28, 41, -16,
4,
]
}
start = "2022-01"
stop = None```
``` requests = download.update_request_date(
request, start=start, stop=stop)
ds = download.download_and_transform(collection_id, request, chunks={"year": 1})```
The error is:
```WARNING Recovering from connection error [HTTPSConnectionPool(host='download-0008-clone.copernicus-climate.eu',
port=443): Max retries exceeded with url:
/cache-compute-0008/cache/data3/dataset-satellite-lake-water-temperature-38ac32be-a63f-4e0e-a758-c6bf4d073
e33.zip (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe5afb6aef0>,
'Connection to download-0008-clone.copernicus-climate.eu timed out. (connect timeout=60)'))], attemps 0 of
500
WARNING Retrying in 120 seconds
WARNING Recovering from connection error [HTTPSConnectionPool(host='download-0008-clone.copernicus-climate.eu',
port=443): Max retries exceeded with url:
/cache-compute-0008/cache/data3/dataset-satellite-lake-water-temperature-38ac32be-a63f-4e0e-a758-c6bf4d073
e33.zip (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fe5afb6a230>,
'Connection to download-0008-clone.copernicus-climate.eu timed out. (connect timeout=60)'))], attemps 1 of
500
WARNING Retrying in 120 seconds
I think you are experiencing this: https://github.com/bopen/c3s-eqc-toolbox-template/issues/14 B-Open does not maintain cdsapi nor the VM you are using, so unfortunately there's not much I can do about these connection errors.
Try to install the environment on your local machine and run locally.
If I use the function to directly download the monthly mean it can be more fast? Can I ask you the command to directly download the monthly mean of the variable?
requests = download.update_request_date(
request, start=start, stop=stop)
ds = download.download_and_transform(collection_id, request, chunks={"year": 1})
What is the collection_id of your dataset?
"satellite-lake-water-temperature"
This is the cds form: https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-lake-water-temperature?tab=form
There's no monthly option, so you can not download monthly data directly. If your chunks are big (I don't know the size of your dataset) you could try to chunk even more (month, day, ...).
Sorry I was not clear, I meant that I would like to use the package and directly download the transformed data in monthly mean. If in this command I want to transform directly the data in monthly mean:
ds = download.download_and_transform(collection_id, request, chunks={"year": 1})
I have to create a function? I saw in the documentation there is the spatial mean example.
ds = diagnostics.spatial_weighted_mean(ds)
return ds.resample(forecast_reference_time="1D").mean("forecast_reference_time")```
I would like to do just monthly mean.
Can I ask you the code for doing it?
Thank you so much.
I think you can just replace "1D"
with "1M"
and use the dimension name for your dataset.
But at the moment it looks like you are having problems downloading the data. The transform function is applied after downloading.
Hello @malmans2 , There have been some changes? I am working today on the Jupyter Notebook again and I have errors from the starts with the packages and on the download command that till today was working without problems. ( I have to say that is very difficult to work like this).
ModuleNotFoundError: No module named 'ESMF'
Hey @Cri-dei, I think I found the issue. I had to pin xesmf on the VM environments.
Please try again and let me know if anything is broken.
I understand the frustration, but unfortunately the workflow on the VM is not ideal for developing, especially because at the moment all packages are in alpha stage. We are working with what we have, which is a VM with a single user. We need to update often the environments so we are in sync with our changes and the changes upstream by ECMWF.
If you don't really need a shared cache (e.g., other people are not processing the same data you are using), you can always develop on your local machine. Then you can move to the VM once you are done developing and you need to run large computations.
Okay now it is working again. I need this package: import pymannkendall as mk. Thanks.
Done!
Hi there, If it's of use, there's a new notebook template under the satellite folder.
Hi @malmans2 ,
I attached here the final version of the code for two dataset: lakes water level, lakes surface water temperature and soil moisture.
For the code for the lakes surface water temperature I have some problems since I have a lot of warnings and it is not able to compute the trend. The code for water level should be okay, that one for soil moisture can be optimized. I do the same plot for three different continents.
These are supposed to be the final code for the deadline of March, I still have to do other 2 dataset that I hope I can send you tomorrow.
Can you please start checking these one? Thanks
OK, I'll take a look!
I've seen that error recently in a xarray
issue, it looks like it's actually a bug in the latest netcdf-c.
We might have to pin libnetcdf
, but I need to explore.
These are the file in jupyter notebook format, better you check this. In the other comment was .py!
Thanks!
Hi @Cri-dei , I looked at the first one and added a template here: https://github.com/bopen/c3s-eqc-toolbox-template/tree/main/notebooks/wp5 I'll work on the other notebooks tomorrow.
Hi @Cri-dei, Are the shapefiles that you are using in your notebook publicly available? Could you please send me the link, so I can add a retrieve function and anyone can run the notebook? Thanks.
Hi! I retrieved from here: https://figshare.com/articles/dataset/Continent_Polygons/12555170/3
And are in my folder in the VM. Soil_moisture/ Continents. http://localhost:5678/tree/Soil%20moisture/Continents
Hi @malmans2 ,
I am trying now to download the data of my last dataset: satellite-lai-fapar". But there are errors. I am doing something wrong in the request? Thanks
collection_id = "satellite-lai-fapar"
request = {
'variable': [
'fapar', 'lai',
],
'satellite': 'proba',
'sensor': 'vgt',
'horizontal_resolution': '1km',
'product_version': 'V2',
'year': '2014',
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'nominal_day': [
'03', '13', '21',
'23', '24',
],
'format': 'zip',
'area': [90, -180, -90,
180,
],
}
start = "2022-01"
stop = None
Weird, looks like your env does not know that you are a wp5 user.
Maybe you just have to restart the server.
Can you try to stop the jupyter server (follow the instructions printed by jupyter_server
, the command is basically jupyter notebook stop $YOUR_PORT_NUMBER
), and open a new session (rerun jupyter_server
and folow the instructions).
Hi @Cri-dei , I've added the soil moisture notebook as well: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/soil_moisture_dataquality.ipynb
Hi @malmans2,
I am Cristina Deidda I am part of WG5 and developing a Use Case code for Era5Land. I managed to enter in the Virtual Machine and install the packages.
If I just run:
import warnings import xarray as xr from c3s_eqc_automatic_quality_control import diagnostics, download warnings.filterwarnings("ignore")
I have this error: ModuleNotFoundError: No module named 'xesmf'
I tried to install it but seems there are dependency issue with the package esmpy.
Can I have help for solving this issue? Moreover, I started writing the code for a UseCase for Thier3b. I attached here the Jupiter notebook script. I would like to have some feedback about the code, if in this way can be okay for the WG5 delivery.
Usecase_Summer Temperature trend in East Europe_v1.zip
It still a draft version but I would like to know if I am going in the right direction.
Thank you, Cristina