EcoExtreML / STEMMUS_SCOPE

Integrated code of SCOPE and STEMMUS
GNU General Public License v3.0
14 stars 4 forks source link

Model input data, era5 vs. plumber2 #130

Open BSchilperoort opened 1 year ago

BSchilperoort commented 1 year ago

To be able to run the model on and (land) location globally, we need the right input data. The ERA5 dataset provides hourly estimates of a large number of meteorological variables, but sadly does not contain all the variables we require to completely replace the plumber2 input data.

The following variables are not available in ERA5:

Of course, ERA5 provides a 10 m wind speed and the 2 m air temperature, so it is questionable how this works with forests, as they do not model canopies.

@yijianzeng @bobzsu how would you like to proceed on this? Is the model sensitive to the wind speed and/or CO2 concentration?

It could be interesting to do a comparison for (some) sites, where we compare the results of a model run with ERA5 data to a model run with the Plumber2 data.

yijianzeng commented 1 year ago

Thank you so much, Bart, Please see below some pointers for your considerations

- CO2: Carbon dioxide data from 2002 to present derived from satellite observations https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-carbon-dioxide?tab=overview

CAMS global greenhouse gas reanalysis (EGG4) https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-ghg-reanalysis-egg4?tab=overview

- Canopy Height: Lang, N., Jetz, W., Schindler, K., & Wegner, J. D. (2022). A high-resolution canopy height model of the Earth. arXiv preprint arXiv:2204.08322 https://nlang.users.earthengine.app/view/global-canopy-height-2020,

- LAI MODIS: https://lpdaac.usgs.gov/products/mod15a2hv061/ MODIS: https://lpdaac.usgs.gov/products/mcd15a3hv061/

GEE: https://developers.google.com/earth-engine/datasets/tags/lai

Of course, ERA5 provides a 10 m wind speed and the 2 m air temperature, so it is questionable how this works with forests, as they do not model canopies. Very good point. The forest is somehow taken into consideration with high vegetation in HTESSEL, but indeed this is very coarse estimation (ERA5-Land). A potential dataset we can look at is the ERA hourly data on pressure levels (37 levels): https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=overview But then, we may need to use the canopy height dataset to find corresponding level of wind and air temperature data in this ERA5 reanalysis dataset (need to see if possible or not).

[I just checked for this pressure levels, they are too coarse (first level is 100m, and then end level at 3km high), let's stick with 10m Wind 2 m Tair]

Please @bobzsu help if i missed something here.

bobzsu commented 1 year ago

Thanks a lot, Bart, Yijian. 1) I think we need to perform a station based validation of the different dataset before deciding which one should be used for global simulation. We may need to do some CDF matching or copulas matching as Sarah did for the ERA reanalysis data. These are valid for meteorological data, CO2 data and radiation data. It seems CAMS reanalysis contain all these data but we need to have a first validation. An idea is is extract all the relevant data and compare them to GEWEX PLUMER2 in-situ data.

2) Canopy height data as indicated by Yijian is fine to use. I was thinking about filling the missing GEDI data beyond 52°N/S with a simple ML algorithm based on PFTs, but the cited paper has done just that. We can also extract location data and compare to those used in PLUMBER2. (The change in canopy height needs to be considered but there need more observation for different years).

3) ERA5 LAI is only climatology - LAI is an essential variable for STEMMUS-SCOPE. We need to have observed values - currently we use those from MODIS but we should migrate to Copernicus as Bart indicated. MODIS will be decommissioned (some time in the future) and we need to use European data. Again we also need to compare them for the overlapping period. There should be no jumps due to different data sources.

4) For the differences in 10m wind speed and 2m air temperature, we need to follow some micro-meteorological principles. We need to use the metrological variables at 3 times the roughness height or 2 times the vegetation height. This is of course only one uses in-situ observation data. ERA5 data is grid data so most people just use it - I think ERA5-Land is slight more reasonable be course it does some terrain correction using adiabatic lapse rate (which impact temperature, due point temperature and precipitation data). If we want to do it correctly, we need to consider the roughness elements of each grid and scale the data to the relevant and same height. I had a paper years ago for dealing with this problem: https://hess.copernicus.org/articles/3/549/1999/ Some implementation was made in the SEBS algorithm but students tended to forget those issues.

5) Yijian: we may see if we should use PLUBMER2 metrics or the C3S one (or even we have have proposed in CORE-CLIMAX). Bart/Sarah: Could one of the eScience existing validation tool be used for this purpose? so that we can do automatic quality control.

yijianzeng commented 1 year ago

Thanks a lot, Bob, just to echo all the points, and strongly suggest we have dedicated actions on point 5, for example, using ESMValTool, adopted for automatic quality control!

SarahAlidoost commented 1 year ago

Based on what is suggested, to start with the implementation, ERA5, ERA5-land, CO2 (Copernicus) and LAI (Copernicus) can be retrieved from CDS and canopy height from GEE.

SarahAlidoost commented 1 year ago

There is GLOBMAP global Leaf Area Index since 1981, see https://zenodo.org/record/4700264. This can be an alternative to LAI from CDS. In addition, in esmvaltool, a cmorizer is being developed see here. We might use some part of the code from there.