cf-convention / discuss

A forum for any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.
43 stars 6 forks source link

How can we encode a DSG trajectory has two identifiers? #282

Open davidhassell opened 8 months ago

davidhassell commented 8 months ago

Hello,

I am creating a DSG trajectory dataset for meteorological research flight paths that each have two separate identifiers:

  1. The name of the route flown
  2. The unique ID of each individual flight

Multiple trajectory features can have the same route name (because the same route is flown on multiple occasions), but no two features have same the ID.

The ID seems to me like the best fit for the auxiliary coordinates withcf_role=trajectory_id, but what about the route name?

Can we store the route names in another auxiliary coordinate variable with standard name region? The conventions say that "When data is representative of geographic regions which can be identified by names [...]. We recommend that the names be chosen from the list of standardized region names whenever possible", which seems OK. However, the description of the region standard name says the contradictory "These strings are standardised. Values must be taken from the CF standard region list.". One of these is clearly wrong!

Any thoughts on this would be appreciated, many thanks, David

taylor13 commented 8 months ago

I think "region" should be reserved for a 2-dimensional geographical area and when two items are in the same location they should belong to the same region. If a friend and I talk walks in a park along different paths (which perhaps cross), I think we are both in the same "park region".

A standard name is not a requirement, so you could define an auxiliary coordinate without a standard name and give it the long_name "route" or "path", or some such. In CMIP6, we defined an ordinary coordinate called "site", which distinguished among about 200 CFMIP sampling locations scattered globally. This was a simple index coordinate which was not assigned a standard_name but with long_name="site index".

I agree that there is an inconsistency in the region description that needs to be cleaned up, but I don't think it should be used for "route" (unless no routes intersect and each is constrained to a single geographically-recognized region).

larsbarring commented 8 months ago

I agree with Karl in that using region in that ways seem to stretch the it a bit too far. As an alternative to using only long name; how about using trajectory_id for 1. The name of the route flown assuming that it is a limited set of routes (i.e. trajectories) that are flown many times, and then use the existing standard namerealization for 2. The unique ID of each individual flight. The description of realization reads

Realization is used to label a dimension that can be thought of as a statistical sample, e.g., labelling members of a model ensemble.

That is, for each route (trajectory) you have an ensemble of individual flights. Does this make sense?

davidhassell commented 8 months ago

Hi @taylor13 and @larsbarring,

Many thanks for your advice.

I think it clear that a standard name of region is not appropriate here.

I quite liked the idea of putting cf_role = trajectory_id on 1. (i.e. the route names auxiliary coordinate variable) since we could attach standardised attribute values to everything. However it may not be ideal, since the conventions say "The variable carrying the cf_role attribute ... must provide a unique identifier for each feature instance." (my emphasis), and the route names are not unique over the set of flights.

So, given that what I think I'll be going for is:

string route(n_flights) ;
    route.long_name = "Name of route for each flight (some routes are repeated)" ;
string flight(n_flights) ;
    flight.cf_role = "trajectory_id" ;
    flight.long_name = "Unique ID of each flight" ;

Does that look OK? David

taylor13 commented 8 months ago

Not being wholly familiar with trajectory_id or cf_role, I shouldn't have the last word, but it makes sense to me given your above summary. Karl