bopen / c3s-eqc-toolbox-template

CADS Toolbox template application
Apache License 2.0
5 stars 4 forks source link

Jupyter notebook for glacier ECVs #112

Closed yoniv1 closed 7 months ago

yoniv1 commented 11 months ago

Notebook description

Notebook 1 (C3S_glaciers_MC_UC1):

Quality assessment: Glacier mass change gridded data from 1976 to present derived from the Fluctuations of Glaciers Database Use Case: A study on the global cumulative mass balance of all glaciers and its uncertainty in the dataset. User Question: How accurate are the glacier mass change products on the CDS when used to derive the global cumulative mass balance of all glaciers in the dataset?

Notebook 2 (C3S_glaciers_MC_UC2):

Quality assessment: Glacier mass change gridded data from 1976 to present derived from the Fluctuations of Glaciers Database Use Case: A study on the annual variability of the glacier mass change product and its trend over time. User Question: How well is the glacier mass change product on the CDS suited to derive mean values, temporal variability and trends?

Notebook 3 (C3S_glaciers_RGI_UC1):

Quality assessment: Glaciers distribution data from the Randolph Glacier Inventory for year 2000 Use Case: Defining how close the glacier characteristics in the dataset are situated near the year 2000 AD. User Question: How close in time are the different digitized glacier data situated to the year 2000 AD?

Notebook link or upload

C3S_glaciers_MC_UC1.zip C3S_glaciers_MC_UC2.zip C3S_glaciers_RGI_UC1.zip

Anything else we need to know?

Environment

malmans2 commented 11 months ago

Which WP is this notebook for?

malmans2 commented 11 months ago

@yoniv1 Looks like the notebooks you shared are corrupted. After unzipping them, I ged this error when I try to open them:

NotJSONError("Notebook does not appear to be JSON: '<!DOCTYPE HTML>\\n<html>\\n\\n<head>\\n ...")
yoniv1 commented 11 months ago

Which WP is this notebook for?

This is WP5!

yoniv1 commented 11 months ago

@yoniv1 Looks like the notebooks you shared are corrupted. After unzipping them, I ged this error when I try to open them:

NotJSONError("Notebook does not appear to be JSON: '<!DOCTYPE HTML>\\n<html>\\n\\n<head>\\n ...")

Can you try with this? C3S_glaciers_JN.zip

yoniv1 commented 11 months ago

Or if not this: C3S_glaciers_JN2.zip

malmans2 commented 11 months ago

The last one works. Could you please edit you very first comment and replace the corrupted zip files? Same for the other issue you opened.

I'll take a look either tomorrow or next week.

yoniv1 commented 11 months ago

The last one works. Could you please edit you very first comment and replace the corrupted zip files? Same for the other issue you opened.

I'll take a look either tomorrow or next week.

Done!

malmans2 commented 11 months ago

Hi @yoniv1,

It looks like none of the notebooks is using the EQC software (mainly designed to handle the row data and the cache on the VM), and they've not been executed on the VM. Did you encounter problems on the VM or using c3s_eqc_automatic_quality_control?

malmans2 commented 11 months ago

Hi @yoniv1,

I'm looking at the first notebook. For the timeseries, looks like you are just summing values, but the units become Gt/yr. Just checking that this is correct.

malmans2 commented 11 months ago

Hi again,

The first template is ready.

As you submitted a large number of notebooks that are not in line with the other templates, it would be helpful if you start revising some of them. You should be able to do so by looking at the code structure in this template. Please start from the templates in #113 so we don't duplicate efforts.

Let me know if this notebook is OK!

yoniv1 commented 11 months ago

Hi again,

The first template is ready.

As you submitted a large number of notebooks that are not in line with the other templates, it would be helpful if you start revising some of them. You should be able to do so by looking at the code structure in this template. Please start from the templates in #113 so we don't duplicate efforts.

Let me know if this notebook is OK!

Hi! Thank you for the first template, it looks good! So what do you exactly expect from me to do now? That I integrate the c3s_eqc_automatic_quality_control in all of the notebooks?

yoniv1 commented 11 months ago

Hi again,

The first template is ready.

As you submitted a large number of notebooks that are not in line with the other templates, it would be helpful if you start revising some of them. You should be able to do so by looking at the code structure in this template. Please start from the templates in #113 so we don't duplicate efforts.

Let me know if this notebook is OK!

Hi, There seems to be an error in the formulation of the request:

collection_id = "derived-gridded-glacier-mass-change"
request = {
    "variable": "glacier_mass_change",
    "product_version": "wgms_fog_2022_09",
    "format": "zip",
    "hydrological_year": [
        f"{year}_{str(year + 1)[-2:] if year + 1 != 2000 else str(year + 1)[:2]}"
        for year in range(year_start, year_stop + 1)
    ],
}
print(request)

{'variable': 'glacier_mass_change', 'product_version': 'wgms_fog_2022_09', 'format': 'zip', 'hydrological_year': ['1975_76', '1976_77', '1977_78', '1978_79', '1979_80', '1980_81', '1981_82', '1982_83', '1983_84', '1984_85', '1985_86', '1986_87', '1987_88', '1988_89', '1989_90', '1990_91', '1991_92', '1992_93', '1993_94', '1994_95', '1995_96', '1996_97', '1997_98', '1998_99', '1999_20', '2000_01', '2001_02', '2002_03', '2003_04', '2004_05', '2005_06', '2006_07', '2007_08', '2008_09', '2009_10', '2010_11', '2011_12', '2012_13', '2013_14', '2014_15', '2015_16', '2016_17', '2017_18', '2018_19', '2019_20', '2020_21']}

If you print the request, one of the variable names says '1999_20', which I think should be '1999_00'. The error is:

 43%|██████████████████▋                        | 20/46 [00:00<00:00, 64.59it/s]2023-11-12 18:14:51,177 INFO Welcome to the CDS
2023-11-12 18:14:51,178 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/derived-gridded-glacier-mass-change
2023-11-12 18:14:51,247 INFO Request is queued
2023-11-12 18:14:52,296 INFO Request is failed
2023-11-12 18:14:52,297 ERROR Message: an internal error occurred processing your request
2023-11-12 18:14:52,297 ERROR Reason:  No URLs produced from request parameters.
2023-11-12 18:14:52,298 ERROR   Traceback (most recent call last):
2023-11-12 18:14:52,298 ERROR     File "/opt/cds/adaptor/cdshandlers/adaptorlib/adaptorrequesthandler.py", line 68, in handle_request
2023-11-12 18:14:52,299 ERROR       return super().handle_request(cdsinf, data_request, self.config)
2023-11-12 18:14:52,300 ERROR     File "/opt/cds/cdsinf/python/lib/cdsinf/runner/requesthandler.py", line 170, in handle_request
2023-11-12 18:14:52,301 ERROR       return handler(cdsinf, request, config)
2023-11-12 18:14:52,302 ERROR     File "/opt/cds/adaptor/cdshandlers/url/handler.py", line 39, in handle_retrieve
2023-11-12 18:14:52,303 ERROR       dis = self._fetch_list(data_request["specific"])
2023-11-12 18:14:52,304 ERROR     File "/opt/cds/adaptor/cdshandlers/url/handler.py", line 167, in _fetch_list
2023-11-12 18:14:52,305 ERROR       "http://copernicus-climate.eu/exc/"
2023-11-12 18:14:52,306 ERROR   cdsclient.exceptions.InternalError: No URLs produced from request parameters.
 52%|██████████████████████▍                    | 24/46 [00:01<00:01, 15.46it/s]
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 ds = download.download_and_transform(
      2     collection_id, request, chunks={"hydrological_year": 1}
      3 )
      4 # Customize some attributes
      5 ds["time"].attrs |= {"long_name": "Time", "units": "yr"}
malmans2 commented 11 months ago

Good news! Looks like ECMWF fixed it. It was 1999_20 until Friday, so I reported it to ECMWF. I'll update the template tomorrow.

Re how to update the notebooks: Yes, I'd use c3s_eqc_automatic_quality_control and xarray as much as possible. For example, I replaced most of your loops with ds.sum, which is more efficient.

malmans2 commented 11 months ago

Good morning @yoniv1,

I've update the template with the latest form (1999_00). I'll work on the other 2 templates later today.

In case you missed my previous message about it, could you please confirm that this is intended?

# Timeseries
ds_timeseries = ds.sum(("latitude", "longitude"), keep_attrs=True).compute()
for da in ds_timeseries.values():
    da.attrs["units"] += " yr$^{-1}$"

So just a sum over lat/lon but change units from Gt to Gt/yr?

malmans2 commented 11 months ago

Hi @yoniv1,

I realised that there's a lot of duplicated code in notebook 1 and 2, so I refactored the templates. There's now one template for producing maps, and one template for producing timeseries. All figures you made should be available in the templates. It should be easy for you to tune details (colors, labels, ...) and to decide how to organize the final notebooks (copy/paste code from the templates to produce both maps and timeseries in a single notebook).

Here are the templates:

Here are the notebooks executed:

Let me know if they're OK!

malmans2 commented 11 months ago

Hi,

The third notebook is also ready. If you are not using the VM, make sure you are using the latest version of the eqc software (I made un update to handle shapefiles as well).

Templates:

Notebooks executed:

Please let me know if we need to change anything or we can close the issue.

yoniv1 commented 11 months ago

Good morning @yoniv1,

I've update the template with the latest form (1999_00). I'll work on the other 2 templates later today.

In case you missed my previous message about it, could you please confirm that this is intended?

# Timeseries
ds_timeseries = ds.sum(("latitude", "longitude"), keep_attrs=True).compute()
for da in ds_timeseries.values():
    da.attrs["units"] += " yr$^{-1}$"

So just a sum over lat/lon but change units from Gt to Gt/yr?

Hi Mattia @malmans2,

Thanks for the remark. I think it is correct given that all mass changes are annual values, i.e. they account for each hydrological year. In that sense, I think it is correct to assign units of Gt yr-1 if you sum all mass changes over the globe for 1 specific year, as this then accounts for the total mass change of the glaciers (Gt) over that specific hydrological year (yr^-1), especially when you express it as a time series with the years on the x-axis. If you would express it as a cumulative value ( a summation over the years), it would be Gt in stead of Gt yr^-1.

yoniv1 commented 11 months ago

Hi,

The third notebook is also ready. If you are not using the VM, make sure you are using the latest version of the eqc software (I made un update to handle shapefiles as well).

Templates:

Notebooks executed:

Please let me know if we need to change anything or we can close the issue.

Hi Mattia,

Thank you for the updates! You really did a good job cleaning up the code and make it more efficient, nice! I will try to run the notebooks later today or on monday and let you know if it works for me.

malmans2 commented 11 months ago

OK, I just wanted to make sure that it was done on purpose.

There's probably a couple of figures in the template with wrong units then (you had a mix of Gt and Gt/yr, I wasn't exactly sure about the criteria).

When you run the notebooks, could you please double check all units, and let me know which figures have wrong units?

malmans2 commented 11 months ago

I've updated the units in the timeseries notebook. Please have a look and let me know if this issue can be closed.

yoniv1 commented 11 months ago

Hi @malmans2

Just 1 remark:

malmans2 commented 11 months ago

Done: https://github.com/bopen/c3s-eqc-toolbox-template/commit/718f9a1997c1e9ba1f5126c80054b3a0a8e497a9

yoniv1 commented 7 months ago

Hi Mattia @malmans2 , Sorry to re-open a closed issue. I wonder if it is possible to change the colorbar from this map from a continuous one to a discrete one (i.e. with blocks instead of a color gradient) in blocks per 10 years starting from 1940 to 2020.

[(https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/glacier_distribution.ipynb)]

Plot glaciers distribution over years: Map ax = plot_map( gdf, var_name="year", cmap="turbo", label="year", title="Glacier distribution around the year 2000 according to the RGI v6.0", )

Thanks!

malmans2 commented 7 months ago

Hi @yoniv1,

Done: image

You can find the updated notebook at the same url. Note that I had to change the naturalearthdata url as it was broken, hopefully the new one is more stable.

Let me know if the new plot looks OK!

malmans2 commented 7 months ago

Closing. Feel free to re-open though if anything is not right.

yoniv1 commented 7 months ago

HI Mattia, @malmans2

I have a new request to be included into the Jupyter Notebook for the glacier distribution data (wp5/glacier_distribution.ipynb).

The .shp file that is downloaded from the CDS (rgi60_global.shp) in that notebook contains an attribute table (216429x17) with information for each digitized glacier. The important columns ones are "BGNDATE", "ENDDATE", "RGIID" and "AREA". What I want to do is to calculate the area-weighted average of the year of digitization (BGNDATE+ENDDATE / 2) for each RGI region (these have a number varying from 1 to 19).

I include below the Matlab code (ran in version 2022a) that I made (it may be a bit inefficient though, but I tried to explain each step I take with some comments).

clear all

T = readtable('rgi_table.csv');   % Read in the attribute table from the shape file
date_year=T(:,4);      % Extract "BGNDATE" given as YYYYMMDD
% Here I try to convert the dates given in the table to fractional years (rounded year + month/12)
% Some months in the dates are invalid (e.g. 19789999 and I discard them)
date_year = table2array(date_year);   % Convert to array
date_year_dec = (mod(date_year,10000));    % Delete first 4 digits
date_year_dec = floor(date_year_dec/100)/12;    % Remove last 2 digits so you have the number of the month left and divide it by 12 to get fractional/decimal year
date_year_dec(date_year_dec>1)=0;          % Remove fraction higher than 1 (dates with '9999') to get final fractional year
date_year = (date_year - (mod(date_year,10000))) / 10000;   % Get first 4 digits (rounded year)
date_year = date_year+date_year_dec;          % Add decimal/fractional year to rounded year

date_year_end = T(:,5);        % Extract "ENDDATE" and apply same procedure
date_year_end = table2array(date_year_end);
date_year_end_dec = (mod(date_year_end,10000));
date_year_end_dec = floor(date_year_end_dec/100)/12;
date_year_end_dec(date_year_end_dec>1)=0;
date_year_end = (date_year_end - (mod(date_year,10000))) / 10000;
date_year_end = date_year_end+date_year_end_dec;

for i = 1:size(date_year_end,1)    % Average BGNDATE+ENDDATE, only for ENDDATE that is not -9999999
    if date_year_end(i) > 0
        date_year(i) = (date_year(i)+date_year_end(i)) / 2;
    end
end

date_year(date_year<0)=NaN;      % Set remaining invalid data to NaN
glacier_area=T(:,10);       % Get glacier area
glacier_area=table2array(glacier_area);
rgi_number=T(:,2);        % Get RGI number
rgi_number=table2array(rgi_number);
newStr = extractAfter(rgi_number,"RGI60-");
newStr = extractBefore(newStr,".");
rgi_number=str2double(newStr);    % Extract RGI region number
idx=find(isnan(date_year));   % Remove remaining invalid data that were set to NaN
date_year(idx)=[];
glacier_area(idx)=[];
rgi_number(idx)=[];

clearvars newStr i date_year_end date_year_dec date_year_end_dec idx

%% Globally

% Regular mean

mean(date_year(:))
std(date_year(:));

% Weighted mean

weights = glacier_area/sum(glacier_area(:));
weighted_mean = sum(weights.*date_year)

%% For each region

rgi_region = 19;
weighted_mean = zeros(1,rgi_region);
normal_mean = zeros(1,rgi_region);
weighted_std = zeros(1,rgi_region);
normal_std = zeros(1,rgi_region);
area_region = zeros(1,rgi_region);

for i = 1:rgi_region
    date_year_region = date_year(rgi_number(:,1)==i);
    glacier_area_region = glacier_area(rgi_number(:,1)==i);
    rgi_number_region = rgi_number(rgi_number(:,1)==i);
    weights_region = glacier_area_region/sum(glacier_area_region(:));
    weighted_mean(i) = sum(weights_region.*date_year_region,'omitnan');
    weighted_std(i) = std(date_year_region,weights_region);
    normal_mean(i) = mean(date_year_region);
    normal_std(i) = std(date_year_region);
    area_region(i) = sum(glacier_area_region(:));
end

clearvars i rgi_region ans

plot(normal_mean,'b','LineWidth',2), hold on, plot(weighted_mean,'r','LineWidth',2), hold on
yline(2000,'k--','LineWidth',2)
xlabel('RGI region number');
ylabel('(Area-weighted) mean of year of digitization (AD)')
legend('Arithmetic mean','Weighted mean','Location','SouthWest')
xlim([1 19])
set(gca,'FontSize',14)
set(gcf,'color','w')
grid on, box on

I get the following result:

rgi_year

Please let me know if anything is unclear.

Thank you!

Yoni

malmans2 commented 7 months ago

I'll take a look as soon as I can (I'm quite busy right now). It would be much better if you could share a python snippet though. The EQC project is python only.

yoniv1 commented 7 months ago

I'll take a look as soon as I can (I'm quite busy right now). It would be much better if you could share a python snippet though. The EQC project is python only.

Yes I know, I had been struggling with it for a few days in Python, so decided to make it in Matlab which is more familiar to me and there it worked fine. I'll do my best to continue working on it in Python as well in the meantime.

yoniv1 commented 7 months ago

Hi Mattia @malmans2

apart from my earlier request above, is it also possible to implement something in the code of the https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp5/glacier_mass_balance_timeseries.ipynb notebook to select a subset of the data to be plotted in the time series? For example with the spatial extent of RGI regions and time bounds. Let’s say I want to plot the glacier mass change time series only for New-Zealand (RGI region 18) in the period 1978-2000? How would I do it in your code? The shapefile with RGI region bounds can be found here: https://www.gtn-g.ch/database/GlacReg_2017.zip

Another option may be with lat, lon bounds.

thanks Yoni

[

rgi_shapefile

](url)

yoniv1 commented 7 months ago

HI Mattia, @malmans2

I have a new request to be included into the Jupyter Notebook for the glacier distribution data (wp5/glacier_distribution.ipynb).

The .shp file that is downloaded from the CDS (rgi60_global.shp) in that notebook contains an attribute table (216429x17) with information for each digitized glacier. The important columns ones are "BGNDATE", "ENDDATE", "RGIID" and "AREA". What I want to do is to calculate the area-weighted average of the year of digitization (BGNDATE+ENDDATE / 2) for each RGI region (these have a number varying from 1 to 19).

I include below the Matlab code (ran in version 2022a) that I made (it may be a bit inefficient though, but I tried to explain each step I take with some comments).

clear all

T = readtable('rgi_table.csv');   % Read in the attribute table from the shape file
date_year=T(:,4);      % Extract "BGNDATE" given as YYYYMMDD
% Here I try to convert the dates given in the table to fractional years (rounded year + month/12)
% Some months in the dates are invalid (e.g. 19789999 and I discard them)
date_year = table2array(date_year);   % Convert to array
date_year_dec = (mod(date_year,10000));    % Delete first 4 digits
date_year_dec = floor(date_year_dec/100)/12;    % Remove last 2 digits so you have the number of the month left and divide it by 12 to get fractional/decimal year
date_year_dec(date_year_dec>1)=0;          % Remove fraction higher than 1 (dates with '9999') to get final fractional year
date_year = (date_year - (mod(date_year,10000))) / 10000;   % Get first 4 digits (rounded year)
date_year = date_year+date_year_dec;          % Add decimal/fractional year to rounded year

date_year_end = T(:,5);        % Extract "ENDDATE" and apply same procedure
date_year_end = table2array(date_year_end);
date_year_end_dec = (mod(date_year_end,10000));
date_year_end_dec = floor(date_year_end_dec/100)/12;
date_year_end_dec(date_year_end_dec>1)=0;
date_year_end = (date_year_end - (mod(date_year,10000))) / 10000;
date_year_end = date_year_end+date_year_end_dec;

for i = 1:size(date_year_end,1)    % Average BGNDATE+ENDDATE, only for ENDDATE that is not -9999999
    if date_year_end(i) > 0
        date_year(i) = (date_year(i)+date_year_end(i)) / 2;
    end
end

date_year(date_year<0)=NaN;      % Set remaining invalid data to NaN
glacier_area=T(:,10);       % Get glacier area
glacier_area=table2array(glacier_area);
rgi_number=T(:,2);        % Get RGI number
rgi_number=table2array(rgi_number);
newStr = extractAfter(rgi_number,"RGI60-");
newStr = extractBefore(newStr,".");
rgi_number=str2double(newStr);    % Extract RGI region number
idx=find(isnan(date_year));   % Remove remaining invalid data that were set to NaN
date_year(idx)=[];
glacier_area(idx)=[];
rgi_number(idx)=[];

clearvars newStr i date_year_end date_year_dec date_year_end_dec idx

%% Globally

% Regular mean

mean(date_year(:))
std(date_year(:));

% Weighted mean

weights = glacier_area/sum(glacier_area(:));
weighted_mean = sum(weights.*date_year)

%% For each region

rgi_region = 19;
weighted_mean = zeros(1,rgi_region);
normal_mean = zeros(1,rgi_region);
weighted_std = zeros(1,rgi_region);
normal_std = zeros(1,rgi_region);
area_region = zeros(1,rgi_region);

for i = 1:rgi_region
    date_year_region = date_year(rgi_number(:,1)==i);
    glacier_area_region = glacier_area(rgi_number(:,1)==i);
    rgi_number_region = rgi_number(rgi_number(:,1)==i);
    weights_region = glacier_area_region/sum(glacier_area_region(:));
    weighted_mean(i) = sum(weights_region.*date_year_region,'omitnan');
    weighted_std(i) = std(date_year_region,weights_region);
    normal_mean(i) = mean(date_year_region);
    normal_std(i) = std(date_year_region);
    area_region(i) = sum(glacier_area_region(:));
end

clearvars i rgi_region ans

plot(normal_mean,'b','LineWidth',2), hold on, plot(weighted_mean,'r','LineWidth',2), hold on
yline(2000,'k--','LineWidth',2)
xlabel('RGI region number');
ylabel('(Area-weighted) mean of year of digitization (AD)')
legend('Arithmetic mean','Weighted mean','Location','SouthWest')
xlim([1 19])
set(gca,'FontSize',14)
set(gcf,'color','w')
grid on, box on

I get the following result:

rgi_year

Please let me know if anything is unclear.

Thank you!

Yoni

Hi Mattia,

I managed to do something in Python, I guess it is very inefficient though it works:

glacier_distribution (1).ipynb.zip

malmans2 commented 7 months ago

Got it, thanks. I should be able to work on this either today in the afternoon or tomorrow morning.

malmans2 commented 7 months ago

Hi @yoniv1, What's the new code I need to review in your notebook? I don't see anything new.

malmans2 commented 7 months ago

I implemented the area weighted mean in the template:

def weighted_average(df, field_name, weights_name):
    df = df[df[field_name].notnull() & df[weights_name].notnull()]
    weights = df[weights_name]
    return (df[field_name] * weights).sum() / weights.sum()

gdf["region"] = gdf["RGIID"].str[6:8].astype(int)
grouped = gdf[["year", "AREA", "region"]].groupby("region")
arithmetic_year = grouped["year"].mean()
weighted_year = grouped.apply(weighted_average, "year", "AREA", include_groups=False)

ax = arithmetic_year.plot(label="arithmetic")
ax = weighted_year.plot(label="weighted")
ax.set_xlabel("RGI region number")
ax.set_ylabel("Year of digitalization")
ax.grid()
_ = ax.legend()

image

I'll take a look a this later today.

(Your issues have many notebooks/requests/discussions unrelated to each other, it's very hard for us to keep track of the progress. From now on please open separate issues for each notebook)

malmans2 commented 7 months ago

@yoniv1 you can now select a subset of regions in this notebook. For example, if you set regions_abbrev = ["NZL"] it will only select New Zeland.

I'm going to close this issue. Please open new issues if there's any problem with the modifications or you need further help.