CCI-Tools / cate

ESA CCI Toolbox (Cate)
MIT License
50 stars 15 forks source link

Tool for creating long-term averages #93

Closed forman closed 7 years ago

forman commented 7 years ago

To analyse ECV anomalies, a certain ECV dataset at a given point in time is compared against a reference dataset, which is usually a long-term average of that ECV (see #73). A reference dataset may be a standard, 30-year climatology, or any other, climatology-like, user-provided ECV average.

The CCI Toolbox should be able to generate such averages for ECVs that comprise continuous measurements, like water optical depth, SST, etc. The baseline should be monthly averages, but the tool shall be configurable for other averaging periods as well.

ECVs that comprise discrete (class) values, like Land Cover classes, may be addressed later too, but usually we can expect that such ECVs come with dedicated averaging tools or offer their "Class-Climatology" products.

kjpearson commented 7 years ago

Here is a short summary of the official definitions for different types of "normals" http://www.wmo.int/pages/prog/wcp/wcdmp/GCDS_1.php from the WMO. Here "Climatological standard normals" is what is meant by a "standard climatology". The other terms are also often called "climatologies" in conversation but "reference" or "baseline" dataset would be better terms to avoid confusion.

lkeupp commented 7 years ago

when talking about climatology, we mean climatological mean? I prefer reference or climatological reference to climatology

forman commented 7 years ago

@lkeupp: according to what @kjpearson and Chris Merchant said, climate scientists use the term climatology in a narrow sense to refer to the standard, 30-year averaging period of a given continuous variable. Therefore it is the variable's mean value in fixed periods of the annual cycle, e.g. monthly. The tool we are talking about here could be used to create such climatologies, but still, it is just a generic tool to create (long-term) averages that can be used to create the reference (or baseline) datasets used in the anomaly computation.

forman commented 7 years ago

Anna kindly provided her minutes from the discussion with Chris two weeks ago:

For many ECVs time series investigations, climatology data is used to remove the normal cyclic effect in the data. The idea is that anything such a difference between the years, or seasonal fluctuations, shows up in the data. This will be very useful for looking at connections between ECVs.

A Standard Climatology is one that is derived over 30 years and it can take the form of monthly averages or daily averages. So, for example, monthly averages of temperature would have a typical value for every January in the period 1981 to 2010. Standard Climatology’s always start in the January of first year of the decade i.e. 81, 91 etc).

Much of the satellite data being used does not cover a 30 year period (only Soil Moisture does) and so it’s more relevant for the climatology to be based on a shorter period such as 20 years (e.g. SST climatology from 1991 to 2010) which could be known as a Reference Climatology. Alternately one can use an Independent Climatology from elsewhere.

The way some ECV datasets behave affects how appropriate granularity of the climatology should be. E.g. Soil Moisture is affected by the previous rainfall, so daily comparisons are not useful, but once a monthly average is applied the trends even out and patterns emerge.

For the Teleconnections Use Case in the Toolbox, it would extend the usability of the data and the Toolbox if we have Climatology data. Not everyone will want to use it, but it would be really helpful to have the option. However each ECV team will have a different climatology associated with it so it would be better to have an Internal Reference Climatology for each team. This may be a monthly average but for some ECVs it could be more appropriate to be a daily or twice monthly climatology.

Once the climatology is done and documented there will be no changes needed so it’s not an ongoing task.

Note for glaciers and fire we probably don’t need a climatology.

For cloud it becomes quite complicated since there is a more 3D aspect to the data and may, for example, need different parameters for different cloud heights.

It could be that the Toolbox would have a connection that can detect if the climatology as netCDF files are available. Alternatively, we could just point the users to the climatology (wherever the files are) and Users can use them if they want to. However it would be good to have Standard climatology (30 Year standard) or the Internal Reference Climatology (e.g 20 years but approved by the ECV team) available on the CCI Open Data Portal.

The Toolbox could provide a link that describes the climatology and what they are used for.

JanisGailis commented 7 years ago

@lkeupp Can you look at this paper and comment on the approach described there with respect to treating uncertainty data when averaging?

Propagation of uncertainties when averaging: http://isi.ssl.berkeley.edu/~tatebe/whitepapers/Combining%20Errors.pdf

lkeupp commented 7 years ago

Looks reasonable. I wrote a mail to Kevin asking him to have a look at the paper, too.

kjpearson commented 7 years ago

The paper gives the correct way to generate the mean and standard deviation from sets of numbers. The question then is whether this provides useful estimators for the parent population(s) that the samples are taken from.

This is the correct way to combine random uncertainties assuming all years are just randomly scattered about the mean that we are calculating (ie. the climatology). The correct way to combine any systematic or correlated uncertainty components in the datasets is a dificult problem. It depends on the spatial and temporal timescales and characteristics involved and is one of the things being worked on in FIDUCEO.

For SST there may be helpful code already in https://github.com/bcdev/sst-cci-toolbox

JanisGailis commented 7 years ago

Thanks!

JanisGailis commented 7 years ago

LTA works with datasources. There will probably be changes to the operation, but those should be new issues.