cf-convention / discuss

A forum for any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.
43 stars 6 forks source link

Question : time series that refer to a geographical region (featureType?) #54

Open TomLav opened 4 years ago

TomLav commented 4 years ago

Dear colleagues,

I want to store time series of (daily, monthly,...) averaged sea-ice extent data. The standard name definition requires (suggests?) that a region is specified to indicate the geographical extent from which the sea-ice extent is computed, e.g. the Barents Sea (CF region name "barents_sea") or the whole northern hemisphere.

In addition, sea-ice extent requires the definition of a threshold coordinate (I choose a scalar coordinate variable).

Barents Sea example:

dimensions:
    time = UNLIMITED ;
    lbl = 1 ;

variables:
    double sie(time);
        sie:standard_name  = "sea_ice_extent";
        sie:coordinates = "geo_region sie_threshold";
        sie:units = "km^2";

    double time(time);
        time:standard_name = "time";
        time:units = "days since 1950-01-01";

    char geo_region(lbl);
        geo_region:standard_name = "region";

    float sie_threshold
        sie_threshold:standard_name = "sea_ice_area_fraction";
        sie_threshold:units = "1";

data:
    geo_region = "barents_sea";
    sie_threshold = 0.15;
    time = ...;
    sie = ....;

Question 1: does the above look ok ?

Question 2: would it be valid to add a global attribute :featureType = "timeSeries" to this file, so that applications that identify time-series data with the featureType attribute can ingest / plot my file ?

In CF-1.9-draft, we still have (http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types)

The designation of dimensions as mandatory precludes the encoding of data variables where geo-positioning cannot be described as a discrete point location. Problematic examples include:

  • time series that refer to a geographical region (e.g. the northern hemisphere), a volume (e.g. the troposphere), or a geophysical quantity in which geolocation information is inherent (e.g. the Southern Oscillation Index (SOI) is the difference between values at two point locations);

  • vertical profiles that similarly represent geographically area-averaged values; and

  • paths in space that indicate a geographically located feature, but lack a suitable time coordinate (e.g. a meteorological front).

Future versions of CF will generalize the concepts of geolocation to encompass these cases. As of CF version 1.6 such data can be stored using the representations that are documented here by two means: 1) by utilizing the orthogonal multidimensional array representation and omitting the featureType attribute; or 2) by assigning arbitrary coordinates to the mandatory dimensions. For example a globally-averaged latitude position (90s to 90n) could be represented arbitrarily (and poorly) as a latitude position at the equator.

So my question: if I want to add :featureType="timeSeries", how do I handle the non-discrete nature of my "location" (the Barents Sea)?

The suggestion: " 2) by assigning arbitrary coordinates to the mandatory dimensions. For example a globally-averaged latitude position (90s to 90n) could be represented arbitrarily (and poorly) as a latitude position at the equator." could probably work for a sea-ice extent for the whole northern hemisphere, that could look like:

Northern Hemisphere example:

dimensions:
    time = UNLIMITED ;
    latb = 2;

variables:
    double sie(time);
        sie:standard_name  = "sea_ice_extent";
        sie:coordinates = "lat sie_threshold";
        sie:units = "km^2";

    double time(time);
        time:standard_name = "time";
        time:units = "days since 1950-01-01";

    double lat;
        lat:standard_name = "latitude";
        lat:units = "degrees_north";
        lat:bounds = "lat_bounds";

    double lat_bounds(latb):

    float sie_threshold
        sie_threshold:standard_name = "sea_ice_area_fraction";
        sie_threshold:units = "1";

data:
    lat = 90.;
    lat_bounds = 0., 90.;
    sie_threshold = 0.15;
    time = ...;
    sie = ....;

attributes:
    featureType = "timeSeries";

I would appreciate your help in designing such a file, and advise on the featureType element. Thank you.

dblodgett-usgs commented 4 years ago

Given what's currently in the spec, to be conformant you would need to add some kind of reference geometry for each of your timeseries. Using something arbitrary is going to get your files to be read by a naive client. Most people recognize NULL island when they see it.

https://en.wikipedia.org/wiki/Null_Island

JonathanGregory commented 4 years ago

Dear Thomas @TomLav You could add

string region_name;
  region_name:standard_name = "region" ;
  region_name:cf_role = "timeseries_id";

with region_name="barents_sea", which is one of the allowed values in http://cfconventions.org/Data/cf-standard-names/docs/standardized-region-names.html, and include region_name in the coordinates attribute i.e. it's a scalar string-valued auxiliary coordinate variable. See also example H2. Jonathan

TomLav commented 4 years ago

Thank you, @dblodgett-usgs

Given what's currently in the spec, to be conformant you would need to add some kind of reference geometry for each of your timeseries. Using something arbitrary is going to get your files to be read by a naive client. Most people recognize NULL island when they see it.

https://en.wikipedia.org/wiki/Null_Island

Either Null_Island, or one of the Pole as representing an hemisphere.

TomLav commented 4 years ago

Thank you @JonathanGregory, I had indeed forgotten cf_role.

With this addition, my Barents Sea examples becomes:

dimensions:
    time = UNLIMITED ;
    lbl = 1 ;

variables:
    double sie(time);
        sie:standard_name  = "sea_ice_extent";
        sie:coordinates = "geo_region sie_threshold";
        sie:units = "km^2";

    double time(time);
        time:standard_name = "time";
        time:units = "days since 1950-01-01";

    char geo_region(lbl);
        geo_region:standard_name = "region";
        geo_region:cf_role = "timeseries_id" ;

    float sie_threshold
        sie_threshold:standard_name = "sea_ice_area_fraction";
        sie_threshold:units = "1";

attributes:
    featureType = "timeSeries";

data:
    geo_region = "barents_sea";
    sie_threshold = 0.15;
    time = ...;
    sie = ....;

I am still unsure this is fully following the convention since:

  1. There is still this note in the Discrete Geometry section (see my OP) stating that "The designation of dimensions as mandatory precludes the encoding of data variables where geo-positioning cannot be described as a discrete point location.". This is exactly my case (my geo-position is not a discrete point but the whole Barents Sea). Are we ready to update the Discrete Geometry text to allow such non-discrete locations?

  2. I am now merging into variable geo_region the concept of station geo-location (region), and the concept of station_name (cf_role). May be ok, but is not exactly in any of the examples.

Thomas

JonathanGregory commented 4 years ago

Dear Thomas

I think a nominal location is useful for generic applications, which might just want to put a dot on a map. You could choose any point in the Barents Sea for it. I think it makes sense to use a region for a station. That correctly describes your data. The word "station" is inspired by the original use of timeseries in discrete sampling geometries.

Best wishes

Jonathan