cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
84 stars 43 forks source link

Recording deployment positions #428

Closed fmanzano-pde closed 1 year ago

fmanzano-pde commented 1 year ago

Title

Recording changes in nominal position (new deployments).

Moderator

@user

Moderator Status Review [last updated: YYYY-MM-DD]

Initial: 2023-01-24

Requirement Summary

Providing a mechanism to record nominal position changes (new deployments) for timeSeries (and other representations) Discrete Sampling Geometries.

Technical Proposal Summary

Creating a new global attribute to record new deployments positions to track the slight differences in nominal positions along the history of the stations.

Benefits

As the technical coleader of the Copernicus Marine In Situ TAC, we're analysing an evolution of our NetCDF implementation to fully comply with CF Conventions. We've realized that our data fit perfectly the Discrete Sampling Geometries (excepting HF radars which are actually gridded data). One of the most relevant inconvenients to proceed forward is the lack of mechanism to register information regarding the nominal position changes. At the moment, Copernicus Marine In Situ TAC reports the nominal position in LATITUDE and LONGITUDE coordinate variables with TIME as dimension.

Status Quo

I've not been able to find any reference to changes in nominal positions in the main standards as CF Conventions or ACDD. OceanSITES includes a platform_deployment_date attribute, considering a deployment as an instrumented platform performing observations for a period of time, considering changes to the instrumentation or to the spatial characteristics of the platform or its instruments constitute the end of the deployment. It's not the case of Copernicus Marine In Situ TAC as we consider the slight differences in positions along time due to maintenances or repositioning after drifts do not affect the continuity of the time series in the long term. 

Associated pull request

#431

Detailed Proposal

The proposal consists of adding a new recommended global attribute officially in the CF Conventions documentation, specifically to "Appendix A: Attributes". The proposed name would be "deployment_positions". The attribute will be multi-valued, a comma-separated list. Each value will be a date, latitude and longitude (blank-separated). The date format will follow the Attribute Content Guidance of ACDD, that is YYYY-MM-DDThh:mm:ss Example: attributes: :deployment_positions = "2013-06-22T12:30:00Z 44.1432 -7.7122,2017-11-23T10:00:00Z 44.1421 -7.7118"; Additionally, un update of the documentation is required to complete the example "Example H.5. A single timeseries with time-varying deviations from a nominal point spatial location" to add the new attribute aforementioned.

JonathanGregory commented 1 year ago

Dear Fernando @fmanzano-pde

Chapter 9 about DSGs contains a provision for recording nominal position as well as the actual, in Section 9.5 and example H5 (on p159 in the working version - it says "example A9.2.3.2" in Sect 9.5 - I don't know why). Would this serve your purpose?

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

Dear @JonathanGregory I'm checking the working version, p159, and it contains examples about profiles. My concern is about p153, example H5 (as you said) --> A single timeseries with time-varying deviations from a nominal point spatial location (what is actually our case). It's true it offers a mechanism to report nominal lat and lon, but they're dimensionless/scalar variables. I think it's a wise choice as this last position (lat and lon) should be the only relevant. Additionally, CF is providing a mechanism to report precise lat and lon (in case the platform has a GPS sensor) in terms of TIME dimension. That's perfect! But... next, the key point of my requirement: It's quite common that the nominal position of a station (mooring) changes throughout its life. Same mooring station is repositioned in every new deployment (what happens during maintenances). This slight changes in nominal position, doesn't affect the timeSeries, but some users requires the traceability of these changes. My proposal answers this requirment providing a mechanism to keep lat and lon scalar (one single value) but recording the changes in the nominal position. Thank you very much for your comment! All the best, Fer

JonathanGregory commented 1 year ago

Dear Fer @fmanzano-pde

I see, thanks for explaining. (I don't know why my page number is different - it's probably better not to rely on them!) While I understand your design, it isn't consistent with CF practice, in which date-times are always numbers stored in variables, not text in attributes. If you regard it as "discovery" metadata following ACDD practice, that's fine, but not in the realm of CF, which is about "use" metadata. On the other hand, you could add another latitude,longitude pair of auxiliary coordinate variables with the time dimension for the deployment position, as well as providing the precise position, and the scalar nominal position. I presume the deployment position doesn't change on every time epoch, so there would be repeated values. If that's a problem, maybe you could put missing data for the deployment position except when redeployment has occurred. Would this be satisfactory? These different kinds of latitude and longitude could be distinguished by standard_name, although they are not at present.

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

Dear Jonathan @JonathanGregory

Thank you very much for your answer. I understand what you mean about "use" metadata. My proposal tried to be aligned with the "history" CF global attribute, that also contains date-times, but you are right: the scope of "history" and "deployment_positions" (the global attribute I proposed) is completely different.

We've already though about these two extra auxiliary coordinate variables (nominal_lat / nominal_lon) you mentioned, but, as you said, we don't like having so many repeated values as a deployment happens very occasionally. The alternative of having many missing values is not valid either because in my opinion it is conceptually misleading.

So, in conclusion, I'll to take the ACDD's way :-)

Thank you very much for your help and clarification!

All the best, Fer

JonathanGregory commented 1 year ago

Dear Fer @fmanzano-pde

The contents of the history attribute are not standardised by the convention, so indeed it could contain date-times in text format, but it doesn't have to. This history attribute is more like discovery metadata than use metadata.

If your deployment position is really for discovery and not use, then ACDD is the right approach. If you want to include it in CF, another possibility has occurred to me. We could regard the deployment position as a kind of bounds variable for the nominal position. Of course, it's not literally "bounds", but it is similar in that it specifies a series of points, traversed in a particular order, associated with a nominal location, like this (based on example H5):

dimensions:
  time = 100233 ;
  ndeployments=5;
variables:
  float lon ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
    lon:axis = “X”;
    lon:bounds = "deployment_lon";
  float deployment_lon(ndeployments);
    deployment_lon:long_name = "longitude at deployment";
  float lat ;
    lat:standard_name = "latitude";
    lat:long_name = "nominal station latitude" ;
    lat:units = "degrees_north" ;
    lat: axis = “Y” ;
    lat:bounds = "deployment_lat";
  float deployment_lat(ndeployments);
    deployment_lat:long_name = "latitude at deployment";
  float precise_lon (time);
    precise_lon:standard_name = "longitude";
    precise_lon:long_name = "precise longitude";
    precise_lon:units = "degrees_east";
  float precise_lat (time);
    precise_lat:standard_name = "latitude";
    precise_lat:long_name = "precise latitude" ;
    precise_lat:units = "degrees_north" ;

We don't need to specify the units or standard_name of the bounds, since they must be the same as the nominal coordinates.

Would this approach be suitable?

Best wishes

Jonathan

JonathanGregory commented 1 year ago

Continuing the last posting: you also need the time of deployment. For the same arrangement, you would need a nominal time, of which this could be the bounds.

  float station_time;
    station_time:standard_name = "time";
    station_time:long_name = "nominal station time";
    station_time:units = "days since 1970-1-1";
    station_time:axis = “T”;
    station_time:bounds = "deployment_time";
  float deployment_time(ndeployments);
    deployment_time:long_name = "time at deployment";

Jonathan

ngalbraith commented 1 year ago

Sorry to be late to the discussion - it's been a rough last few months.

In our OceanSITES long-timeseries (merged deployments) files, we have more fields that are represented by nominal values but which we have deployment-level values. Aside from lat/long, we provide: sensor make/model, measurement depth, watch circle, magnetic correction applied, etc. We thought having just the start time of each deployment would make for too much burden on the user, and might lead them to lose information about instrumentation, measurement depth, etc. To simplify access to these deployment-level fields, we added a time series variable that contains the deployment number, hoping to make it straightforward to index into the relevant fields.

Our convention hasn't been adopted by OceanSITES, which is pretty flexible on this type of file, but it works well for us. I'm curious about whether I can claim that this is CF-compliant.

Selected details: dimensions: TIME = 5166060 ; LATITUDE = 1 ; LONGITUDE = 1 ; Deployment = 8 ;

--- scalar, nominal lat/long double LATITUDE(LATITUDE) ; (ditto longitude) LATITUDE:long_name = "Nominal site latitude" ; LATITUDE:standard_name = "latitude" ;

--- time series fields double TIME(TIME) ; TIME:standard_name = "time" ; TIME:units = "days since 1950-01-01 00:00:00" ; int DEP_NUM(TIME) ; DEP_NUM:long_name = "mooring deployment number" ; double AIRT(TIME) ; (ditto all the measured time series variables)

--- deployment level fields double start_date(Deployment) start_date: standard_name = "time" ; start_date:long_name = "the date of the first good data point in the deployment" ; start_date(:units = "days since 1950-01-01 00:00:00" ; double DEP_LAT(Deployment) ; (ditto _LON) DEP_LAT:long_name = "surveyed anchor latitude of each deployment" ; DEP_LAT:standard_name = "latitude" ; double DEPTH_SST(Deployment) ; (ditto z-axis data for all sensors) DEPTH_SST:standard_name = "depth" ; DEPTH_SST:long_name = "depth of sst" ; double magnetic_correction(Deployment) ; magnetic_correction:long_name = "magnetic_correction applied to winds and currents, per deployment" ;

JonathanGregory commented 1 year ago

Dear Nan @ngalbraith

I understand the convention. It's CF-compliant but CF would not make the link between your deployment variables and your time-dependent variables, using the DEP_NUM variable. That's an extra convention of OceanSITES. The DEP_NUM variable presumably contains a lot of repetition, which Fer wanted to avoid.

It occurs to me that the OceanSITES convention is a bit like compression by gathering in CF section 8.2. A CF-compliant version, simplifying your example, would be:

TIME = 5166060 ;
  LATITUDE = 1 ;
  Deployment = 8 ;

double LATITUDE(LATITUDE) ;
  LATITUDE:long_name = "Nominal site latitude" ;
  LATITUDE:standard_name = "latitude" ;
  LATITUDE:units = "degrees_N";

// time series fields
double TIME(TIME) ;
  TIME:standard_name = "time" ;
  TIME:units = "days since 1950-01-01 00:00:00" ;
double AIRT(TIME) ; // (ditto all the measured time series variables)
  AIRT:standard_name = "air_temperature";
  AIRT:units="degC";
int  Deployment(Deployment) ;
  Deployment:long_name = "time index of the first data point from the mooring deployment" ;
  Deployment:compress="TIME";

// deployment level fields
double start_date(Deployment)
  start_date: standard_name = "time" ;
  start_date:long_name = "the date of the first good data point in the deployment" ;
  start_date:units = "days since 1950-01-01 00:00:00" ;
double DEP_LAT(Deployment) ;
  DEP_LAT:long_name = "surveyed anchor latitude of each deployment" ;
  DEP_LAT:standard_name = "latitude" ;
  DEP_LAT:units = "degrees_N";

The CF attribute compress of the Deployment variable is the mechanism which makes the link. It declares that elements of the Deployment dimension are a subset of the elements of the TIME dimension, and it gives their indices in TIME, in order, with 0 meaning the first element of TIME. It's the same kind of indexing as yours, but the index is supplied for the compressed version of TIME rather than the expanded version.

Best wishes

Jonathan

JonathanGregory commented 1 year ago

Dear all

The DSG ragged array representations (section 9) are ways of saving space in data variables as well as reducing the number of coordinate variables. The compression by gathering is another way of saving space in variables. In the way it's described and shown (section 8.2), its aim is to reduce the size of data variables, but it would work equally well for coordinate variables that have the same dimensions as data variables. I don't think that would be a change in the convention.

Therefore it seems to me that we could use the DSG ragged array representation for a set of ocean data timeseries from moored stations in combination with compression by gathering for information about the deployment and redeployment of the stations. Here's an example, based on H.6 for timeseries of station data in the contiguous ragged array representation:

dimensions:
  station = 23 ;
  obs = 1234 ; // aggregated number of times in all timeseries
  Deployment = 58 ; // aggregated number of deployments for stations
variables:
  float lon(station) ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment);
    deploy_lon:standard_name = "longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  float precise_lon (obs);
    precise_lon:standard_name = "longitude";
    precise_lon:long_name = "precise station longitude";
    precise_lon:units = "degrees_east";
  string station_name(station) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
  int row_size(station) ;
    row_size:long_name = "number of observations for this station " ;
    row_size:sample_dimension = "obs" ;
  int Deployment(Deployment) ;
    Deployment:long_name = "index of the first obs after (re)deployment" ;
    Deployment:compress="obs";
  double time(obs) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1970-01-01 00:00:00" ;
  float temp(obs) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lon station_name precise_lon deploy_lon" ;

The sample_dimension of the row_size variable is the DSG mechanism that links the obs and time dimensions, the obs dimension being a concatenation of the time dimensions of the individual timeseries. The compress attribute of the Deployment variable is the mechanism that links the obs and Deployment dimensions, where the Deployment dimension is a subset of the obs. Logically it is the same as if we had provide deploy_lon(obs), giving the most recent deployment longitude for every observation, but this would be very repetitive. As I suggested to Fer, we could put missing data for all except the first obs after a new deployment. In that case deploy_obs would be mostly missing data. The compression by gathering eliminates the duplicated missing data by just storing the non-missing elements.

If there's only one timeseries, we don't need the DSG ragged array mechanism. We can just use time for the obs dimension, as in my previous example.

I wonder if that seems like a good way to deal with the issue raised by Fer @fmanzano-pde and Ludovic @ludo-ifr.

Best wishes

Jonathan

dblodgett-usgs commented 1 year ago

@JonathanGregory -- This is an interesting suggestion. Wouldn't the coordinates of the temperature observation actually just be the time [precise_lat] precise_lon though? It seems very confusing to have three sets of spatial coordinates for one observation.

This does seem like a valid approach and is a way to use ragged arrays to encode these three sets of spatial information. I would just hesitate to overload the coordinates attribute with things that don't vary along the obs dimension.

JonathanGregory commented 1 year ago

Dear Dave @dblodgett-usgs

Yes, the precise_lon(obs) is the longitude of the temperature of observation. We already have a convention for supplying the scalar "nominal" lon(station). The extra requirement is also to specify where the station was deployed, after each time it's been removed from the water. From what Nan and Fer said, this is also useful metadata they want to record. I wasn't sure whether to put the deploy_lon(Deployment) in the coordinates attribute. I included it because Deployment is a compressed view of obs, and thus equivalent to obs logically.

Best wishes

Jonathan

dblodgett-usgs commented 1 year ago

Thanks for reminding me of the nominal station location.

I'm still now sure about the coordinates attribute of the temperature variable having the lon and deploy_lon in it. Is that according to CF as written? How would some software make the connection to the nominal and deployment positions using that information?

ludo-ifr commented 1 year ago

Thanks @JonathanGregory for this example.

So lon(station) will be the last known nominal position.

It will be important to know the answer of the question raised by @dblodgett-usgs

Just to concretise the last example:

dimensions:
  station = 1 ;
  time = 1234 ; // aggregated number of times in all timeseries
  Deployment = 3 ; // aggregated number of deployments for stations
variables:
  float lon(station=1) ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment=3);
    deploy_lon:standard_name = "longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  float precise_lon (time);
    precise_lon:standard_name = "longitude";
    precise_lon:long_name = "precise station longitude";
    precise_lon:units = "degrees_east";
  string station_name(station=1) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
  int row_size(station=1) ;
    row_size:long_name = "number of observations for this station " ;
    row_size:sample_dimension = "time" ;
  int Deployment(Deployment=3) ;
    Deployment:long_name = "index of the first time after (re)deployment" ;
    Deployment:compress="time";
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1950-01-01 00:00:00" ;
  float temp(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lon station_name precise_lon deploy_lon" ;

data:
 station_name = "44088";
 lon = 74.841;
 deploy_lon = 74.839, 74.842, 74.841;
 Deployment = 24537, 24654, 26691;
 row_size = 1234; 
JonathanGregory commented 1 year ago

Dear @dblodgett-usgs and @ludo-ifr

Probably my proposed use of compression by gathering is an extension to the existing convention. In 8.2, we expect that the data variable will be compressed as well as its coordinate variable, in which case the compressed dimension will be a dimension of the data variable. I'm proposing to use this convention to compress an auxiliary coordinate variable (to avoid lots of repetition) although the data variable isn't compressed. A new exception is needed for the coordinates attribute to make this legal, as Dave suggests, but the link is easy to make. If we adopt this convention, it would be permitted for a coordinate variable named by a coordinates attribute of a data variable to have a dimension for which there is a variable of the same name that has a compress attribute which names one or more dimensions of the data variable! That's a bit of a mouthful, isn't it, but I don't think it's really complicated.

Referring to @ludo-ifr's example: The data variable temp has a coordinates attribute which names an auxiliary coordinate variable deploy_lon which has a dimension Deployment. The file contains a variable of the same name (Deployment) which has a compress attribute, which names the dimension time. This is a dimension of the data variable temp that we started with. The Deployment dimension is a compressed version of the time dimension. Since time is a dimension of temp and Deployment is equivalent to time, deploy_lon(Deployment) is permitted to be listed by the coordinates attribute as an auxiliary coordinate variable.

Best wishes

Jonathan

dblodgett-usgs commented 1 year ago

Overloading coordinates like this makes me a little uneasy. I think it would be cleaner to express the same information in a new attribute (or two) designed specifically for this use case.

e.g.

  float temp(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:deployment_coordinates = "deploy_lon";
    temp:precise_coordinates = "precise_lon";
    temp:coordinates = "time lon station_name" ;

The original use case doesn't seem to require a separate precise_lon, correct? (I think the word "nominal" is getting in our way here) In that case, our contiguous ragged array representation could be:

dimensions:
  station = 2; // total number of stations
  time = 1234 ; // aggregated number of times in all timeseries
  Deployment = 6 ; // aggregated number of deployments for all stations
variables:
  float lon(station=2) ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment=6);
    deploy_lon:standard_name = "longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  string station_name(station=2) ;
    station_name:long_name = "station name" ;
    station_name:cf_role = "timeseries_id";
  int row_size(station=2) ;
    row_size:long_name = "number of observations for this station " ;
    row_size:sample_dimension = "time" ;
  int deployment_size(Deployment=6) ;
    deployment_size:long_name = "number of observations for this deployment" ;
    deployment_size:sample_deployment_dimension = "time" ;
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1950-01-01 00:00:00" ;
  float temp(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:redeploy_coordinates = "deploy_lon"
    temp:coordinates = "time lon station_name" ;

data:
 station_name = "44088", "44089;
 lon = 74.841, 76.841;
 deploy_lon = 74.839, 74.842, 74.841, 76.839, 76.842, 76.841;
 deployment_size = 205, 206, 206, 206, 205, 206;
 row_size = 617, 617; 

A similar approach could be taken for indexed ragged array if needed... but I think having a stronger separation of concerns for the typical single-position / time series and the more nuanced sampling-position that varies over time is probably going to be cleaner?

fmanzano-pde commented 1 year ago

I'm really sorry for not writing until now, but I had a good excuse: this week is being bananas because I'm moving... and trying to survive in the new house, which is still a jungle of boxes...

What a nice surprise finding out the issue has raised enormous interest. Thank you very much all of you!

I'd like to mention that the option provided by @JonathanGregory and concretised by @ludo-ifr suits pretty well to the casuistic I wanted to reflect, and seems to be perfectly aligned with CF Conventions which is my main concern.

On the other hand, I understand perfectly @dblodgett-usgs, the option of splitting "coordinates" attribute in "coordinates"/"precise_coordinates"/"deployment_coordinates" attributes also suits well. However, I find this solution more ad hoc.

So, if the aforementioned solution, is accepted...

For one station:

dimensions:
  time = 1234 ;
  Deployment = 3 ;
variables:
  float lon ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment);
    deploy_lon:standard_name = "longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  float precise_lon (time);
    precise_lon:standard_name = "longitude";
    precise_lon:long_name = "precise station longitude";
    precise_lon:units = "degrees_east";
  int Deployment(Deployment) ;
    Deployment:long_name = "index of the first time after (re)deployment" ;
    Deployment:compress="time";
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1950-01-01 00:00:00" ;
  float temp(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lon precise_lon deploy_lon" ;

...I'd really like to include an additional example in the documentation to show how to record deployments' positions.

All the best,

Fer

ludo-ifr commented 1 year ago

I just want to add that we could have 3 different cases:

  1. A station with nominal positions (with different deployments) and GPS position
  2. A station with only nominal positions (with different deployments)
  3. A station with only GPS position but we know that it is a fixed buoy (This particular case happens because we receive data from the GTS with template for moored buoys OMM 315008)

In the third case we have to choose what will be lat & lon (could be the first position reveived, the last one, a mean of position, ...).

dblodgett-usgs commented 1 year ago

@fmanzano-pde -- no worries at all to be a little asynchronous.

However, I find this solution more ad hoc.

Is that necessarily a negative?

By overloading coordinates, we require a person to look at a dataset to confirm that certain variables are being handled according to their use case. What I was trying to do is get to a place where a computer would be able to unambiguously and correctly interpret the data without human intervention.

@ludo-ifr -- I think the word nominal is being misinterpreted here. There is already an accommodation for a "nominal" position with moving "precise location(s)" -- so 1 is ok. I don't think there is anything wrong with having a single time varying X and Y axis coordinate variable -- so option 2 is ok. Isn't case 3 a special case of 2? The mechanisms we've discussed above are all viable to some degree.

@JonathanGregory -- I think the trade off is that adding variables with explicit metadata for these use cases as I've suggested can be layered on top of the existing convention pretty easily. Further overloading the coordinates with a third set of spatial coordinates requires a modification of the existing convention or, to my eye, it introduces ambiguity regarding what the role of each coordinate variable is.

e.g. How do you tell the difference between:

  float lon ;
    lon:standard_name = "longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment);
    deploy_lon:standard_name = "longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  float precise_lon (time);
    precise_lon:standard_name = "longitude";
    precise_lon:long_name = "precise station longitude";
    precise_lon:units = "degrees_east";

Are you expecting implementing software to use shared dimensions and standard name to get everything figured out?

In the case of example H.5, we have:

      float lon ;
          lon:standard_name = "longitude";
          lon:long_name = "station longitude";
          lon:units = "degrees_east";
          lon:axis = “X”;
      float precise_lon (time);
          precise_lon:standard_name = "longitude";
          precise_lon:long_name = "station longitude";
          precise_lon:units = "degrees_east";

Note that axis = "X" is only on one of the two spatial coordinate variables. If we have two additonal X coordinate variables, I feel like we should be adding a cf_role or other custom attribute that distinguishes these from each other. I'm fine being wrong here, but having written some software to tease apart the coordinate variable relations with this part of the spec, it often feels like we are "divining" the relationships more than determining them.

All the best -- happy we are on this topic!

JonathanGregory commented 1 year ago

Dear Dave @dblodgett-usgs

If I have correctly understood the use case, the three types of position are needed for an anchored floating platform which may drift around to some extent:

In my example, three different mechanisms are used for attaching these coordinates (scalar coordinate variable, auxiliary coordinate variable with the time or observation dimension, auxiliary coordinate variable with a compressed time or obs dimension), but I wouldn't use the way they're attached to distinguish them. I think we ought to define standard names for the nominal and deployment location coordinates. I prefer that to cf_role because they are different quantities.

Best wishes

Jonathan

dblodgett-usgs commented 1 year ago

Hi @JonathanGregory -- Thanks for the clarification on the use case.

Overloading standard_name seems a little problematic. These are, after all, just lon/lat coordinates, so the longitude standard name is correct. cf_role is another attribute that has been used to distinguish "special" coordinate variables (that aren't one of the XYZT axes). e.g. timeseries_id. So it would seem that additional cf_role attributes would in line with that practice?

If the consensus is to have standard_name carry information about more than the quantity being encoded, then I would agree on using standard_name, but that's not my read on the purpose of standard_name

Regards -- Dave

JonathanGregory commented 1 year ago

Dear Dave

Standard names can be more or less precise, in order to draw distinctions as required. For example, we have both "time" and "forecast_reference_time", which can both be present as coordinate variables of different dimensions of a data variable, and are distinguished by standard_name.

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

Dear @dblodgett-usgs and @JonathanGregory

Thank you very much for the discussion. If I understood correctly, the winner option is to use standard names for the different latitudes and longitudes, isn't it?

Rewriting my example...

dimensions:
  time = 1234 ;
  Deployment = 3 ;
variables:
  float lon ;
    lon:standard_name = "nominal_longitude";
    lon:long_name = "nominal station longitude";
    lon:units = "degrees_east";
  float deploy_lon (Deployment);
    deploy_lon:standard_name = "deployment_longitude";
    deploy_lon:long_name = deployment longitude";
    deploy_lon:units = "degrees_east";
  float precise_lon (time);
    precise_lon:standard_name = "precise_longitude";
    precise_lon:long_name = "precise station longitude";
    precise_lon:units = "degrees_east";
  int Deployment(Deployment) ;
    Deployment:long_name = "index of the first time after (re)deployment" ;
    Deployment:compress="time";
  double time(time) ;
    time:standard_name = "time";
    time:long_name = "time of measurement" ;
    time:units = "days since 1950-01-01 00:00:00" ;
  float temp(time) ;
    temp:standard_name = “air_temperature” ;
    temp:units = "Celsius" ;
    temp:coordinates = "time lon precise_lon deploy_lon" ;

If it's correct...

  1. Which would be the way to proceed, to close the issue? Could it be added to the CF documentation as a new example to show this use case?

  2. Should I ask for adding the new standard names nominal_longitude, deployment_longitude, precise_longitude (and latitudes)? How?

JonathanGregory commented 1 year ago

Dear @fmanzano-pde

I would be happy with this approach, unless there's a better idea. Dave @dblodgett-usgs may be unconvinced - I'm not sure. Other opinions would be welcome. If we go this way: (1) Yes, we could add it as a new example, which could be elaborated in this issue, (2) New standard names should be proposed as a new issue in the discuss repo. I feel that we should use the existing latitude and longitude for the precise versions, since they're used in DSGs and other applications for the actual location of observations, and add the nominal and deployment variants as new.

Best wishes

Jonathan

dblodgett-usgs commented 1 year ago

If the decision to use standard_name for both quantity type and function relative to other variables has already been made, and continuing that pattern instead of using cf_role is preferable to the group, then I think that overloading standard name with this functional description is the right decision to remain consistent with the specification. As I've said above, I disagree with using standard_name in this way, but that's a separate issue.

I may open a separate issue to discuss that point because it has rubbed me the wrong way for a long long time.

IPerezGonzalez commented 1 year ago

Hi all,

I've been following the discussion for a while now. I am not sure I understand what looks to be the main issue here, which is the overcrowding of the coordinates attribute that would prevent "unambiguously and correctly interpret the data without human intervention" (@dblodgett-usgs).

For instance I do see that only one out of the now several variables bearing latitude values listed in ":coordinates" must be unambiguously identified as the "nominal latitude", X axis of a feature instance. I tend to think that the already existing variable attribute ":axis" on the variable that defines the "nominal latitude" should be sufficient to prescribe the variable with the X axis for a feature instance.

But I don't see why the other latitude variables need to be assigned a specific "role" other than what the user wants to make out of them. Probably it is here where I am missing literacy. What are reading packages, machines, trying to do with those other latitude variables listed in ":coordinates"?

Would it be a matter of CF Conventions having to turn more assertive on the use of ":axis" more than just recommending?

Best,

IPG

fmanzano-pde commented 1 year ago

Dear all,

thank you very much for your responses. I have to say that I agree with all of you... In my opinion, the situation is getting out of hand. Let me explain myself.

@dblodgett-usgs I think you did it well creating a separate issue to discuss the differences between cf_role and standard_name

@IPerezGonzalez I agree with you, the most important attribute regarding the coordinates is the attribute axis

So... Should we use standard_name and/or cf_role to distinguish between the different latitudes/longitudes? In my opinion, none of them.

@JonathanGregory I was really satisfied with your solution using the dimension deployment, the variable deployment (deployment) as a :compress of time, and deploy_lon(deployment).

Why don't we forget about adding more information?

At the moment, "H.2.3. Single time series, including deviations from a nominal fixed spatial location" is including:

What do you think?

All the best, Fer

JonathanGregory commented 1 year ago

Dear Fer @fmanzano-pde

Thanks for the example. We will also need to make a small change in the section about compression by gathering, to allow an auxiliary coordinate variable to be compressed when the data variable isn't.

I had thought that you wanted three different kinds of location, but your example shows that you need only two. You don't have an unchanging nominal location for the station. Do you, or does anyone, have a need for a fixed (nominal) location and an infrequently changing deployment location? If we only need one of them, we may need only one new standard name.

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

I'm sorry @JonathanGregory, but I'm confused, I didn't understand your last post.

It's very important to show clearly the last known deployment position as it will be the nominal position, that is the coordinates X and Y. It's true that in my example both lat and deploy_lat are deployment positions, but lat plays the role of nominal.

I understand you mean that you still think that it's important to distinguish between a measured lat (precise_lat) and a not measured lat (lat and deploy_lat), don't you? But in the end all the positions are measured, because even in the deployments GPS sensors are used to set the position.

By the way, I've edited my previous post to clearly talk about "deployment" positions, and reserve the word "nominal" only for the one containing the axis attribute, that is the coordinates.

All the best, Fer

JonathanGregory commented 1 year ago

Dear Fer @fmanzano-pde

Sorry for confusion. I think I misunderstood you regarding "nominal". The example is fine with three kinds of location. The nominal location does not change, the deployment position changes occasionally, the precise location could be different for every observation. Is that correct? We can distinguish the three kinds of location by their standard names. I think it's logical for the precise location to be plain latitude and longitude because they go with time, and this is naturally what you would do for a non-DSG trajectory in (x,y,[z,]t). The nominal coordinates could be nominal_latitude and nominal_longitude, the deployment coordinates deployment_latitude and deployment_longitude. Thus, four new standard names are needed, and the nominal ones could be used in other examples as well in Appendix H, perhaps.

Best wishes

Jonathan

ngalbraith commented 1 year ago

Dear JonathanGregory -

I most definitely need a fixed nominal position (which identifies the site) and an infrequently changing position (that's more nearly correct for a given deployment). We usually also have GPS units recording on our buoys, but we don't publish that data because, for subsurface data, it's still not actually correct on a slack mooring line.

I had thought that you wanted three different kinds of location, but your example shows that you need only two. You don't have an unchanging nominal location for the station. Do you, or does anyone, have a need for a fixed (nominal) location and an infrequently changing deployment location? If we only need one of them, we may need only one new standard name.

I'm very concerned that removing the option of using latitude and longitude standard names for nominal positions will make our data less usable.

These 3 levels of position can be considered as similar to data where,say, air temperature is presented at different intervals - 1 minute being observed, but hourly and daily averages being provided for long time series use. These are all given the air_temperature standard name and the difference between them is noted by the time stamp, the long name, and maybe cell methods. That approach seems more straightforward to me.

fmanzano-pde commented 1 year ago

Dear all,

I feel that we are very close to an agreement. We all understand the affair and the mechanism to do it is more or less clear. The only pending issue would be the standard_names to be used in the different types of location, right?

For me it was ok the last proposal @JonathanGregory made, but I also understand the concern @ngalbraith mentioned in the last post. However, I think that adding the new standard names proposed by Jonathan shouldn't be a problem.

In the specific case shown by @ngalbraith, the GPS position is not distributed as it is confusing (for subsurface data, it's still not actually correct on a slack mooring line). Perhaps, in that case, the nominal position provided could be kept as just latitude and longitude (as it is now), because there are no other types of location to distinguish. That way retrocompatibility would be kept.

By the way, I'd really appreciate if @dblodgett-usgs could express his opinion on this specific topic about standard names and the last proposal I made based on Jonathan's one.

... I can smell the final agreement on this.

All the best, Fer

dblodgett-usgs commented 1 year ago

Hi Fer --

I'm afraid I've kind of lost track here. Without a stable conceptual basis for all this, the words aren't holding a stable meaning and I'm struggling to keep all these things straight.

Let me see if I understand where we are at...

We have three sets of locations.

  1. a single location for a station for all time
  2. a deployment location that varies once in a while
  3. a location that may vary for every time step

@JonathanGregory has suggested additional standard_names to indicate that each of the longitude / latitude coordinate variables be clearly identified and the use of a coordinates attribute to indicate that the coordinate variables should be tied to specific data variables.

I have concerns about this approach because I feel that overloading standard_name to carry more than physical property introduces complexity that would better be represented by cf_role. However, I'm not going to die on that sword. standard_name is already used in this overloaded way in other cases and the existing design pattern should be reused.

So, unless others want to argue otherwise, I think the right thing to do is define new standard_names that carry the meaning of these three kinds of spatial coordinates.

Regards,

fmanzano-pde commented 1 year ago

Thank you very much Dave, clear as water.

The only pending issue here is the suitability for the situation Nan brought to the fore. I insist it shouldn't be a problem as the new standard names could be used, but also "latitude" and "longitude" could be kept as there are no other locations provided.

Anyway, I'll wait for her consent before closing the discussion and moving on.

All the best, Fer

JonathanGregory commented 1 year ago

Dear Fer @fmanzano-pde

I agree that latitude and longitude could continue to be used for nominal latitude and longitude, as Nan @ngalbraith describes. Actually we could simplify the proposal, bearing in mind what section 9.5 says:

Only the set of coordinates which are regarded as the nominal (default or preferred) positions should be indicated by the attribute axis, which should be assigned string values to indicate the orientations of the axes ( X , Y , Z, or T).

Since we can use the axis attribute to distinguish the nominal and precise positions, both kinds of coordinate can use latitude and longitude standard names, as they do in the existing example. In that case, we need only propose deployment_latitude and deployment_longitude as new standard names.

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

Dear @JonathanGregory

I've created a new branch for the pull request: https://github.com/fmanzano-pde/cf-conventions-deployment_position

Would you mind to include the "small change in the section about compression by gathering, to allow an auxiliary coordinate variable to be compressed when the data variable isn't" you mentioned in this branch?

Thay way (I guess) the pull request would include all the related changes at once? Anything else?

Thank you very much!

All the best, Fer

PS Regarding the "pull request" I don't know what info I have to add in the # Release checklist. Could you help me?

JonathanGregory commented 1 year ago

Dear Fer

It's easier for me to draft some text here, which you could copy into your branch.

For the pull request, you have to add a line at the start of history.adoc, and add yourself to the additional authors in cf-conventions.adoc.

Best wishes

Jonathan

fmanzano-pde commented 1 year ago

Thank you very much @JonathanGregory for the support and all your remarks. I've added everything to the pull request just opened (#431). I've changed the name of this issue 428 as the original title could mislead. Now, I guess it will take some time to approve the pull request. Anyway, thank you very much all of you. All the best, Fer

JonathanGregory commented 1 year ago

Dear all

That looks fine to me, thanks, Fer @fmanzano-pde. To summarise: the pull request amends an example in Appendix H to show deployment locations for a DSG timeseries of ocean observations, in addition to the nominal location and the precise location. Both of the latter still have standard names of latitude and longitude, and Fer has separately proposed new standard names for the deployment position.

Nan @ngalbraith and Dave @dblodgett-usgs, are you willing to support the proposal in this form? (I am aware that Dave has reservations more generally, but not specifically about this proposal.) Do others have comments to make?

Best wishes

Jonathan

IPerezGonzalez commented 1 year ago

Dear all,

I just have a very minor comment. It looks like there is a good agreement that enables the user accessing deployment locations in a simple way.

However, I see a risk of minimal incoherence with the introduction of the deployment_latitude|longitude standard_name(s): the nominal latitude and longitude for the time series will be that of a deployment (looks like the last deployment position is the one of choice) This nominal positions are therefore in nature the new deployment_latitude|longitude standard_names(s). I fully agree with keeping the standard name latitude|longitude for the nominal positions, as I think everyone does.

I just wanted to flag that by introducing the deployment_latitude|longitude as standard names, there is now a bit of an incoherence . Maybe I am more alligned with @ngalbraith and feel that the long_name would be the place tell them apart.

best,

Irene

fmanzano-pde commented 1 year ago

Dear @IPerezGonzalez,

It's unquestionable there are many ways to do it, in fact, during the last months we've been discussing about them to find out the best solution. Yesterday night I went to bed really happy as I thought we finally achieved to reach an agreement. My joy in a well...

I understand your concern, but I don't completely agree with you. It's true that most of the times, the nominal latitude and longitude for the time series will be that of the last deployment, but not necessarily: Imagine that a provider decide that the nominal latitude and longitude values of the whole time series is not the last deployment but the first one, with the objective of keeping the same nominal position without changing it ever (it also makes sense, because these changes in the deployment position don't have significant implications - otherwise, inevitablye, time series would have to be split).

So, I still defend the adopted solution. The nominal position is different conceptually from the deployment positions, although it's true that nominal position and deployment position can coincide. Using long_name is not as powerful as using standard_name, due to we'll loose the M2M capability to distinguish between different variables. As far as I understood, Nan concern was about retrocompatibility, the obligation of changing the current nominal latitude and longitude standard names but they don't need to to that with the proposed solution.

All the best,

Fer

JonathanGregory commented 1 year ago

Dear @IPerezGonzalez @fmanzano-pde

I too understand Irene's concern but I agree with Fer's argument. (Unlike both of you, I'm not an expert in this area.) Nan @ngalbraith wrote, "I'm very concerned that removing the option of using latitude and longitude standard names for nominal positions will make our data less usable." Following her comment, I changed what I had earlier suggested, so that latitude and longitude are still the standard names for nominal location, which is a different geocientific quantity from deployment location. Although nominal and observational (precise) location have the same standard names, they are distinguishable by the axis attribute, which should be attached to the nominal position only (X for longitude, Y for latitude) according to section 9.5.

Nan @ngalbraith, do you think the current proposal is OK?

The axis attribute has more of a "structural" role (as Dave @dblodgett-usgs would describe it). We could perhaps make more use of it for discrete sampling geometries. I see that only one example in Appendix H (the one which Fer is modifying) shows axis="X" or axis="Y", although many have axis="Z". We could perhaps recommend greater use of axis in chapter 9 and the examples. That is a more general issue, however. I don't think we ought to address it in this specific issue.

Best wishes

Jonathan

IPerezGonzalez commented 1 year ago

Dear @JonathanGregory, @fmanzano-pde ,

Learning that for the nominal position there might be cases where you don't have the actual deployment information and therefore have to build the nominal position somehow from the GPS coordinates, has helped me be more at ease with the idea of the deployment coordinates having a different stardard name.

I agree with Chapter 9 and examples review (in a different issue). I was surprised not to find the axis attribute on latitude and longitude for examples in the trajectory DSG, for instance. Maybe in this case there is a reason I ignore, but I would expect :axis to be there.

Kind regards,

Irene

ngalbraith commented 1 year ago

@ IPerezGonzalez

the nominal latitude and longitude for the time series will be that of a deployment

This is not the way I'd use the terms in my data sets. In sequentially redeployed buoys, there's a nominal position for a "site" and an exact deployment position every time a buoy is redeployed. If we want a new set of terms for nominal positions, I'd be inclined to use nominal_latitude etc.

DocOtak commented 1 year ago

When I'm looking at an observational dataset that I've never seen before and it claims to conform to some CF version. I'm going to expect that the lon/lat/z variables conform to Chapter 4 and would contain at least one variable each with standard names latitude longitude and whatever the Z and T are (if applicable). I'm probably not going to care exactly how these values were determined and trust that the data provider has used their domain expertise to provide the positions that others should use when doing some analysis.

I think these new standard names should be addition to and not replace the existing standard names in a dataset. In a situation where a deployment position is the only one you have, I think the existing latitude and longitude names take precedence and if you want to include deployment_latitude and deployment_longitude they should be in their own (probably duplicated data) variables.

Very related, my group has a small need for recovery positions as well. The three locations for CTD/Rosette casts we used to report in WOCE:

Just to be explicit: I support:

I do not support:

JonathanGregory commented 1 year ago

Dear Andrew @DocOtak

Does the present form of the proposal about these standard names look OK to you? Please feel free to propose names for recovery position in another discuss issue if there's a use-case.

Because @fmanzano-pde opened the separate issue about the standard names, in this issue we need only consider the changes to the convention. These are minor, I would say, and relate to allowing compression to be applied to an auxiliary coordinate variable in a discrete sampling geometry. Please see example H.5 in the modified Appendix H. Would you support it?

Best wishes

Jonathan

DocOtak commented 1 year ago

@JonathanGregory I do support the modified example in H.5 as it clearly shows:

My data are not ready for a recovery_position proposal yet

JonathanGregory commented 1 year ago

Dear Andrew @DocOtak

Thanks for your support. At present, then, I don't think there are any concerns that have not been addressed, and sufficient support has been expressed. Therefore this proposal will be accepted in 21 days (5th April) if no further problems are raised before then.

Best wishes

Jonathan

JonathanGregory commented 1 year ago

Three weeks have passed without further comment, so this change is now accepted according to the rules. Thanks for proposing it and seeing it to a conclusion, Fer @fmanzano-pde. I'm going to merge the pull request, which will close this issue.

fmanzano-pde commented 1 year ago

@JonathanGregory

I've created a new pull request: https://github.com/cf-convention/cf-conventions/pull/436 to add the author to the header in addition to the list.

Thank you very much!