Closed sandrocalmanti closed 4 months ago
Hi @sandrocalmanti,
Is this for WP3 or 5?
This is WP3
Hi @malmans2
just to let you know that we have scheduled a meeting with ECMWF to discuss this prototype on october 19th next week. In case you can provide a first feedback this week, I will try to profit as much as possible of the updates, and present some preliminary analysis.
Meanwhile, I have tried to preload some of the actual data that we need to use, and the prototpye seems to work.
If you're busy with other notebooks, nevermind we will have to work on this again after the feedback from ECMWF.
Best!
Sandro
Hi @sandrocalmanti, Sure, no problem. I'm finishing up a heavy notebook for WP4, but I should be able to take a look either today or tomorrow. I'll be in touch!
Hi @sandrocalmanti , The template is ready and everything is cached.
I'm not sure how the bias is supposed to be, so please let me know if the results look OK.
A couple of comments:
Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp3/bias_assessment.ipynb Here is the notebook executed: https://gist.github.com/malmans2/3fc74a6f83aeaa334b0e3202284c9e7d
(another comment just to double check, meteo_france
is not in the list of centres)
Hi @sandrocalmanti,
Did you get a chance to test this template? Does it work OK?
Dear @malmans2,
due to personal problems I was not able to move forward with this notebook and I'm sorry to have put you in a hurry a few weeks back. I'm getting back on this work.
Hi @sandrocalmanti, no worries!
Hi @malmans2
I'm checking this notebook on bias and it seems there's something odd in some areas (East Asia, Europe, North America). In general the pattern of systematic bias as a function of starting month and valid month looks reasonable as I'm expecting some regulare seasonality, as it is clearly shown in the figures.
However for the three areas above the value of the bias is too large and unrealistically different from the bias of the other areas.
I will do some more testing with my initial implementation, let's see if I can find the problem.
The compact facet looks good!
Next step I will have to implement is to show for one of the areas (e.g. Europe) and for one of the models (e.g ECMWF) the actual maps of the bias for different starting dates and different valid times.
Cheers
Sandro
Ciao @malmans2,
I made an hybrid version by merging the original implementation and your revision.
I think that in the revision a key aspect of the analysis is missing. In fact, the bias is computed as
with xr.set_options(keep_attrs=True):
bias = (ds_seasonal - ds_reanalysis)["t2m"]
However with this implementation with xarray DataArrays the common dimensions are automatically aligned for the computation, whereas in fact we need to mix them up so that the reference starting month and the associated lead time provide the appropriate valid time of the forecast, which is then compared to the corresponding valid time of the reanalysis.
This why I define an empty DataArray bias, which is progressively filled by using the indices i,n,k that run over starting dates, centres and regions respectively.
There might be different ways to do it but the following lines in which all variables are aligned to the same valid_time are important.
#Compute ERA5 monthly climatologies
ds_reanalysis_mean = ds_reanalysis.groupby('forecast_reference_time.month').mean()
#Set the position of valid times in ERA5 climatology
valid_time = (i + np.arange(leadtime_month.size)) % 12
#Extract climatology at valid times
reanalysis = ds_reanalysis_mean.t2m[valid_time].values
#Compute seasonal forecast climatologies
#Ensemble average and time average over all forecast_reference_time (which is in fact the "valid" time of the forecast)
ds_seasonal_mean = ds_seasonal.mean(dim=['realization','forecast_reference_time'])
seasonal = ds_seasonal_mean.t2m.values
#Compute bias
bias_valid_time = seasonal - reanalysis
#Fill in the bias array at the right place (i.e. at valid times)
bias_full_year = np.full(12, np.nan)
bias_full_year[valid_time] = bias_valid_time
ds_bias.bias[i,:,n,k] = bias_full_year
I have changed a bit the structure of the bias DataArray to make it easier to interpret and visually more effective. In the notebook you will also find a short paragraph on how to read the plot, so it should be easier to understand what I'm trying to do.
I have also prepared a similar analysis for precipitation instead of 2m_temperature (not included here) which is of course identical but requires a conversion factor because cumulated variables are stored differently in reanalysis and in seasonal forecasts
# Compute bias, see https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790
# for conversion table of cumulated variables and fluxes
# Unit here is mm/month
reanalysis = 1000*ds_reanalysis_mean.tp[valid_time].values*[days_per_month[i - 1] for i in valid_time]
seasonal = 1000*ds_seasonal_mean.tprate.values*86400*[days_per_month[i - 1] for i in valid_time]
bias_valid_time = seasonal - reanalysis
This is how the plot should look like at the end of the process
Here's the new notebook. We can talk if it requires more explanation. I understand it's not standard but it's the best I could think of to summarize the bias of the multimodel ensemble. It's actually a visalization that I would like to propose to represent the multimodel skill as well.
Hi @sandrocalmanti,
Thanks for the detailed explanation! Could you please check whether the new template works OK? If yes, I'll cache precipitation as well.
Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp3/bias_assessment.ipynb Here is the notebook executed: https://gist.github.com/malmans2/3fc74a6f83aeaa334b0e3202284c9e7d
Hi @malmans2 it works and looks great. Amazing what one can do by using stuff properly.
Please go ahead with precipitation.
I just had a quick look at the notebooks I've lunched last night (there's now a cell to convert units). Both temperature and precipitation are cached (you can find the notebooks executed and template at the same links above).
Let's touch base next week. Have a good weekend!
You too, thank you @malmans2
Ciao @malmans2, I have checked the notebooks and both temperature and precipitation look good.
What I'd like to do now is to generalize the notebook to show both variables at the same time and ideally any other other variable that we may choose to assess.
I was looking at the code and thought that we can usa a dictionary to define different attributes for each variable, like the color maps and correction factors.
variables = {
"2m_temperature": {
"cmap": "RdBu_r",
"correction_factor_seasonal": 1,
"correction_factor_reanalysis": 1,
},
"total_precipitation": {
"cmap"="BrBG",
"correction_factor_seasonal": m_to_mm * day_to_s,
"correction_factor_reanalysis": m_to_mm,
In particular, since we are computing the bias, we don't need to correct kelvin to celsius, while multiplicative factors need to be applied where necessary (in general they apply only to cumulated variables).
We're using this notebook for a use case on agricolture, so ideally a more complete list of variables that users in this sector would be interested in are
'2m_temperature',
'total_precipitation',
'surface_solar_radiation_downwards',
'10m_u_component_of_wind',
'10m_v_component_of_wind',
'2m_dewpoint_temperature',
Would it be possible for you to generalize to at least temperature and precipitation? I can think of some basic ways to do it but I wouldn't want to mess up things in your code.
And by the way, since the notebook is now written efficiently I'm using it to see where are the most significant improvements in seasonal forecasts from the same centre (e.g. ecmwf) when the system is updated.
Depending on the output we will think on how to best use the results.
Hi @sandrocalmanti,
It's quite easy because we can chunk the requests along the parameter "variable"
as well.
Let's keep it easy for now and use dedicated cells for plotting and converting units. I'm hoping that precipitation is an exception and the other variables use the same conventions (t2m is in good shape). For plotting, most of the times people need to play a lot with the settings, so I think it's easier to control everything from the plotting cell.
I removed the last cell because otherwise it would make a ton of plots. Let me know if you actually need it there.
You can find precipitation and temperature at the same links as before, I will cache the other variables overnight so I can take a look at it tomorrow.
Grazie @malmans2
looks great! Precipitation and surface radiation have a similar treatment of converision but I guess the way it's written now allows me to change the method for each variable quite easily and I agree it's better to have separate cells for plotting and converting.
We don't need the last cell anymore. Your implementation of the visualization works much better. I'm thinking of a final probability density distribution plot to summarize the results presented in the panels, but I first want to check with ECMWF whether it makes sense to have one.
A dopo
S.
Hi @sandrocalmanti,
Unfortunately the notebook crashed last night while caching eccc
.
Looks like it's missing 2m dewpoint temperature
.
We designed the code in such a way that all models must have all data. We can refactor it to handle missing data, but it needs some work. How do you want to proceed? Can you exclude from the notebook models that don't have all data?
It would be extremely useful to handle missing data.
I haven't checked all variables for all models but at times I had to introduce some unnecessary restrictions to the analysis because the alignment among models is not perfect.
Here's a few occurrences that might be useful to handle (not necessarily all of them):
2m_dewpoint_temperature
for eccc ->In general, I would say that whenever the mean
reported below can be computed among more than 20 elements instead of the expected 24, which is the regular length of a monthly time series when all data ara available, it doesn't make sense to exclude the model from the assessment
for dim_in, dim_out in zip(["forecast_reference", "verifying"], ["starting", "valid"]):
ds_seasonal = (
ds_seasonal.groupby(f"{dim_in}_time.month")
.mean(keep_attrs=True)
.rename(month=f"{dim_out}_month")
)
Anything you can make to handle missing data will be useful.
Maybe I'm overlooking something, but right now I think the only cases that we are not covering is when a whole year or a whole variable is missing (we chunk the request along year and variable). I haven't encountered the first case for this dataset so far, and it shouldn't be a problem to fix the the second one.
I'll take a look later today when it's done downloading the rest of the data.
Ok,great!
Hi @sandrocalmanti,
I revised the notebook. I didn't check the results, let me know if something doesn't look right or if you need to convert some variables. Looks like eccc is the only centre that is missing some data.
You'll see that it takes a few minutes to retrieve all cached data. It's the downside of having a flexible cache (everything is cached separately so we can easily add/remove data, regions, periods, ...). If it's taking too long for your use case, we could add a second level of caching (I usually try to avoid doing it because we need to expose the caching functions).
Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp3/bias_assessment.ipynb Here is the notebook executed: https://gist.github.com/malmans2/3fc74a6f83aeaa334b0e3202284c9e7d
Thank you so much @malmans2
This update is also incredibly timely. The meeting with ECMWF that was scheduled for october and that I had to cancel is scheduled for this afternoon and the new version of the notebook is also answering to some of the comments that where made to the preliminary version.
I don't mind waiting for a few minutes for the retrieval and it is good to maintain the code as clean as possible,
As expected surface radiation needs to be corrected similarly to precipitation. I will check this and try to follow the same coding style.
So far, so good!
Ciao @malmans2
just had a good round of discussione with Anca, Chris and Eduardo at ECMWF and I'm sharing some key highlight which will guide the finalization of this notebook. In general very positive reaction with the following observations:
Is it possible for you to start looking at how we can integrate the regionmask
described in point 1. package in the notebook so we don't slice the areas using their coordinates but we apply some predefined masks?
A presto
S.
Hi @sandrocalmanti,
Sure, it's easy to implement, we already use it in a couple of WP5 notebooks. (e.g., https://gist.github.com/malmans2/d3ebd832f7f00c05628852f56a29e773) If you tell me the areas/regions you'd like to use, I will show you some example.
In the meantime, I'll install regionmask on the WP3 VM.
Hi @malmans2
I'm now moving with this notebook along the lines indicated in my last message of Dec 14, 2023, after the discussion with ECMWF.
D520.3.2.3b.SEASONAL_multimodel-bias_v3.ipynb.zip
I have worked on point 4. (add sample bias maps) there is now a new section on Global map of temperature bias, which is drafted by stealing pieces of other notebooks (e.g. from WP4 mapping the bias of CORDEX and CMIP6).
In particular, I have defined a new function compute_time_mean_and_regrid
, after looking at compute_regridded_timseries
used clima_and_bias_cordex_cmip6_regionalised.ipynb. Please have a look if it's used correctly.
About point 1. (use smaller focus areas based on regionmask package we would focus the analysis on the following 8 SREX areas: 3, 5, 8, 9, 16, 13, 23, 24. With METNO we have agreed to use a similar regionalized approach for the forthcoming notebooks on skill assessment.
About point 3. (use wind speed instead of U,V) I think it will be sufficient to replace
"10m_u_component_of_wind",
"10m_v_component_of_wind",
with
"10m_wind_speed",`
in the list of variables, as wind speed is available for all originating centres.
Please let me know if all looks clear and feasible to you.
Best!
Dear @malmans2
just to add that for some originating centres (for example JMA) I find again a problem in indexing the valid time of the forecast in the function compute_time_mean_and_regrid
. I think you have already handled this, but it's not entirley clear to me how it's going on for example here in the section Daonwload and transform seasonal forecast
backend_kwargs={
"time_dims": (
"verifying_time",
(
"indexing_time"
if centre in ["ukmo", "jma", "ncep"]
else "time"
),
)
},
Find attached an example of the error I'm getting with data from JMA.
D520.3.2.3b.SEASONAL_multimodel-bias_v3_time-indexing.ipynb.zip
Hi @sandrocalmanti,
I'm working on this, but it will take some time.
The CDS is very busy right now, so it takes time to download the new variable needed. The global mean code looks OK, we can probably simplify a little for this use case, I'll add it to the template as soon as I'm done downloading the new data. I've already implemented the regionmask part and I'm caching the new regions.
The backend kwargs in the template work OK, right? To be honest, I don't know why the grib files of ["ukmo", "jma", "ncep"] have a different time structure, but I'm no expert in forecast data. Maybe you can try to ask Anne-Mette Olsen? I think I've copied the backend kwargs from her code. See also this comment: https://github.com/bopen/c3s-eqc-toolbox-template/issues/67#issuecomment-1607147811
Hi @sandrocalmanti,
Quick question. I need to double check this, but I'm pretty sure we can download ERA5 data already interpolated on 1°/1° grid (same a seasonal forecast). That way we don't have to implement our own regridding step, and the map you want to show use the same data used in the rest of the notebook.
What do you think, would that be OK? Note that the main difference is that we would use 1°/1° ERA5 data to compute the reanalysis mean (so far we've ben using 0.25°/0.25°).
Quick question. I need to double check this, but I'm pretty sure we can download ERA5 data already interpolated on 1°/1° grid (same a seasonal forecast). That way we don't have to implement our own regridding step, and the map you want to show use the same data used in the rest of the notebook.
What do you think, would that be OK? Note that the main difference is that we would use 1°/1° ERA5 data to compute the reanalysis mean (so far we've ben using 0.25°/0.25°).
Nevermind! I realized that you are interpolating from 1°/1° to 0.25°/0.25°. Forget about this comment.
Dear @malmans2,
the interpolation step would be questionable anyway. I prefer the interpolation to from 1° to 0.25° because it doesn't imply losing information on the reanalysis side and, essentially, the figure looks nicer.
However in this notebook the interpolation is not critical in terms of assessment. Maps are only there to illustrate the framework for the followinf systematic analysis,
I'll check with Anne-Mette what is the function of the backend kwargs there.
Hi @sandrocalmanti,
All new features should be implemented now.
I made a little refactor, so please make sure that everything works OK. Let me know!
Thank you @malmans2
looks great for now, I will check more carefully later this week.
S.
Hi @sandrocalmanti,
Can we close this issue as well? BTW, if you haven't done it already, let me know if you want me to cache the timeseries using spatial weighted means (I think we only cached unweighted diagnostics).
Hi @malmans2,
I haven't cached the data with spatial weighted means. If you can do that, this issue can be closed as well.
Done!
Dear @malmans2 I need to re-open this issu for a couple of updates.
postprocess_dataarray
, but with a slightly different factor. We should add to postprocess_dataarray
something like if da.name == "surface_solar_radiation_downwards" and da.attrs["units"] != "W m**-2":
if da.attrs["units"] == "J s**-1":
factor = 1. / day_to_s
else:
raise ValueError(f"{da.attrs['units']=}")
if "valid_time" in da.coords:
units = "W m**-2"
with xr.set_options(keep_attrs=True):
da *= factor
da.attrs["units"] = units
so the total accumulated energy expressed in J s-1 in ERA5 is converted to the average flux in W m-2 as in the forecast. I'm not sure however about the last part of the code on the substitution of the units.
After looking at the results I realize that SREX regions may not be the most appropriate to assess the bias. There's a lot of compensation of positive and negative bias inside some of the regions. I would try a different set of subregions, namely regionmask.defined_regions.ar6.land
instead of regionmask.defined_regions.srex
.
This means that
1. in the Definition of Parameters, the region names will change slightly and should be asserted in a different set
# Regions
regions = [
"NEAF",
"ENA",
"MED",
"NES",
"SAS",
"SEA",
"WNA",
"CAR",
]
assert set(regions) <= set(regionmask.defined_regions.ar6.land.abbrevs)
2. In the function regionalised_spatial_weighted_mean
the masking will use a different set
mask = regionmask.defined_regions.ar6.land.mask(ds)
index = regionmask.defined_regions.ar6.land.map_keys(region)
I don't know if the choice of the set of regions (e.g. srex, ar6.land, ar6.ocean, giorgi, etc) can be passed as a parameter but this is not strictly necessary.
I've tried to implement to implement these changes myself but I'm getting errors on the unit conversion, I'm sure I'm messing up with the name of the units, and the notebook needs to be re-cached anyway for the new regions?
Is it possible for you to work on the update with the indications provided above?
Sure no problem, I'll work on this tomorrow.
Hi @sandrocalmanti,
I'm on it.
if da.name == "surface_solar_radiation_downwards" and da.attrs["units"] != "W m**-2":
if da.attrs["units"] == "J s**-1":
factor = 1. / day_to_s
else:
raise ValueError(f"{da.attrs['units']=}")
This raises an error because the units are J m**-2
.
Is it a typo in your code, or do we need to use another factor when the units are J m**-2
?
You're right, it's a typo.
This is one you're not confident with Python and you say OMG I'm crashing ECMWF with this error. Of course the unit is J s**-2
ehehe, ok. I'm already caching the new data as the cached part is not affected by the unit change. Everything should be ready tomorrow.
Let me double check though because looks like you also have a typo in your last message.
We need to convert from J m**-2
to W m**-2
, and the conversion is done like this:
da_w_m2 = da_j_m2 / day_to_s
Correct?
It's correct.
Hi @sandrocalmanti,
The new template is ready. Same urls:
Please let me know if everything is in good shape.
Beautiful!
Do you need me to adjust the other template as well?
Ciao @malmans2
yes the hit-rate template needs to be aligned over the same regions. Please use this list (I did more testing and changed CAR to NWS, more interesting to show as example)
defined_regions_module_name = "ar6.land" regions = [ "NWS", "ENA", "MED", "NEAF", "NES", "SAS", "SEA", "WNA", ]
Done, I will close this and re-open the other one.
Ciao @malmans2
I'm trying to execute your code on the virtual machine but it keeps on chrashing at the end of the download of the seasonal forecast data. I will keep on trying to executed the full notebook. However, since ECMWF might want to see this urgently, is it possible for you to execute the attached full version of notebook, with text and comments so then we can use it to generate the new permalink?
Notebook description
This is a prototype notebook for the assessment of the bias of seasonal forecast systems. The objective is to explore systematically the model bias over a different geografical areas of a significant scale. More details are provided in the notebook.
In this prototype I'm using pieces of code that have been already verified for the uncertainty assessment. To simplify the test I'm fudging the procedure to use always the same data over the same small area already in the cache from the uncertainty assessment, regardless of the focus area or modeling system that will be used for the actual computation.
As a result all plots at the end of the notebook are identical.
I'm not 100% sure about what test areas for the bias assessment we want to use. Therefore, if there are ways to optimize the data retrieval and pre-processing that would be certainly useful.
Notebook link or upload
bias-assessment_test.zip
Anything else we need to know?
Among others, there's two specifc aspects to check:
dataset
structure that was used in the initial code for the uncertainty assessment.Environment