cf-convention / vocabularies

Issues and source files for CF controlled vocabularies
3 stars 1 forks source link

Tidal Data Epoch Description #188

Closed roy-lowry closed 4 years ago

roy-lowry commented 4 years ago

Proposer's name Roy Lowry Date 8 August 2020

During the discussion of tidal sea surface Standard Names (#57 ) the point was made that different averaging intervals were used for the determination of reference levels used in different data sets and that these averaging intervals (epochs) should be recorded within the CF NetCDF file.

Whilst the epoch could be included as scalar variables with bounds labelled using the long_name attribute it is felt that providing a Standard Name raises the visibility of the metadata element thereby increasing usage.

The proposed Standard Name, description and units are:

- Term tidal_datum_epoch - Description A specific time period over which tide observations are taken and reduced to obtain mean values (e.g., mean lower low water, mean sea level) to provide reference levels (tidal datums) for subsequent water level measurements. The standard name is used for an ancillary scalar time variable that may be linked to any sea level variable referenced to a tidal datum. It is suggested that the scalar variable be set to the epoch midpoint. Bounds to specify the start and end of the epoch are mandatory. It is recommended that a plain-language description of the epoch (e.g. 1960-1978) be included in the long_name attribute. - Units Seconds

JonathanGregory commented 4 years ago

Thanks, Roy. That makes sense to me. Jonathan

DanHollis commented 4 years ago

The use of a reference climatology occurs in other fields e.g. when expressing monthly mean air temperature as an anomaly from the 1961-1990 average, or when expressing a monthly precipitation total as a percentage of the 1981-2010 average, for example.

I support the addition of a standard name for capturing the epoch and the general approach (scalar variable set to the midpoint and mandatory bounds) makes perfect sense. However I also think it would be better to have a more general name rather than making it specific to tidal data. How about reference_datum_epoch?

JonathanGregory commented 4 years ago

While I understand Dan's argument, I think that it's better to be more specific, because it's more informative. There is an argument for having a generic name if there are too many specific names being proposed. I don't think we're in that situation, though. Obviously in principle such a standard name could be proposed for every quantity, but in practice it hasn't been. Jonathan

aaron-sweeney commented 4 years ago

@roy-lowry Your proposal makes sense to me, too. Thanks.

DanHollis commented 4 years ago

I'm struggling to see how making this standard name specific to tidal data brings much benefit. If you were to drop the references to tidal data would that make it any less informative? e.g. using Roy's proposed wording:

"A specific time period over which ... observations are taken and reduced to obtain mean values (...) to provide reference levels (... datums) for subsequent ... measurements. The standard name is used for an ancillary scalar time variable that may be linked to any ... variable referenced to a ... datum"

I realise, of course, that the recommended choice of reference epoch may vary according to the type of data. However, in the case of the proposed tidal variables, this information is being captured in the description for those standard names i.e. "a Tidal Datum Epoch, which is a period of time that is usually greater than 18.6 years to include a full lunar cycle" (https://github.com/cf-convention/vocabularies/issues/74). A similar approach could be used for other variables where a different recommended epoch applies.

feggleton commented 4 years ago

Hi all,

I know most of this discussion happened in another issue (#57) but would be good to iron out some of those discussions here if possible. At first, I was in agreement with the tidal_datum_epoch term being specific to tidal names and quite liked the original definition given by Aaron [The specific 19-year period over which tide observations are taken and reduced to obtain mean values (e.g., mean lower low water, etc.) for tidal datums. The epoch is specified by a start and an end year.] as it was simple and to the point. I understand that this might have evolved and changes have been made to avoid mentioning the 19-year period due to some places having a different value. Therefore this definition above does work ok. There has been support from a couple of other people which is good.

However, I can relate to the argument of making this term more generic. It depends whether a more generic term is needed and would be used or if the need for this term came specifically from tidal names and is the only use for such thing. From what I'm reading from @DanHollis it seems like a more generic term might have some community value. This is going to require more input and discussion to come to some conclusion on this.

Out of interest how do other communities currently define their reference periods? ie Air temperature anomalies. I have been looking at a few files in the CEDA archive and can see description attribute used and long name attribute used to give more information on the anomaly variable but none specifically state the reference period in the metadata of the file. Typically this is in the documentation or accompanying dataset records. Would this be a good thing to start doing now or is there a reason it hasn't been included in the metadata so far? I only looked at a couple of files so may be completely wrong but don't recall seeing it in files.

I would also like to mention there was another comment seen here -https://github.com/cf-convention/vocabularies/issues/74 where Andy Saulter agreed with the more generic term "If there was to be a specific standard name for this, then I think @DanHollis 's comment about using something generic (e.g. datum_epoch) to avoid proliferation would be sensible."

roy-lowry commented 4 years ago

Thanks @feggleton I've been waiting for further views/comments. Maybe you will stir something up. My preference is for Dan's more generic approach.

JonathanGregory commented 4 years ago

Dear Fran

I think it's useful to find out what people do and what they may need, as you are asking - thanks. As usual, we should not propose a solution for something that hasn't been identified as a problem. My starting point is that the need to name the tidal datum epoch has been identified and a specific name for it will address that need. I think standard names should be specific, because that is most informative, and it serves their purpose of indicating which quantities can be regarded as the same. For example, we could have a single standard name for "temperature" in all circumstances. It would be geophysically correct, but not helpful.

Best wishes

Jonathan

ukmo-ansaulter commented 4 years ago

Thanks all for continuing to look at this. Walking the line between specific and generic is not easy and, whilst a generic where sensible approach may enable new users to pick up workable standard names across a range of uses more easily without needing to request specific new ones, I do see where you are coming from @JonathanGregory

If we are being specific to the problem in hand, then either tidal_datum_epoch or the more generic reference_datum_epoch (long name includes the tide part) would work perfectly for any applications our group currently deals with

(better check this fence I'm sat on for splinters)

DanHollis commented 4 years ago

I guess my view would be that the problem already exists. As noted by @feggleton, existing datasets containing anomaly data have 'solved' the problem by using a description attribute, a long name attribute or accompanying documentation to record the reference epoch. My feeling is that this is important metadata and it would be better if it could be recorded in a consistent way in all datasets containing anomaly data. This could be achieved with domain-specific standard names, however I would say that the concept is sufficiently general that a single generic standard name would be the most elegant solution.

roy-lowry commented 4 years ago

The discussion here is whether to go with my original proposal or to amend it as proposed by @DanHollis as follows

When drafting this proposal I was in off-list correspondence with some of the sea level community who were unified in the view that they wanted this Standard Name to be as narrow as possible. The reason I've been so quiet is that I have been waiting for them to join the discussion. As they have not I'll give my two-penneth.

My position is that I prefer the modified version. I don't think the 'CF only deals with current use case' argument holds here. The whole point of that policy is to protect against excessive Standard Name creation just in case they're needed. . To me this is a case of preventing future work by taking a broader semantic view now. I also don't think the 'temperature' example is valid. This Standard Name is different in that it only makes sense as an ancillary variable. Consequently, it is always linked to something that will deliver the additional semantics that @JonathanGregory would like to see.

JonathanGregory commented 4 years ago

Dear @roy-lowry

I believe that the point of the "CF only deals with current use case" argument is that we cannot reliably "prevent future work by taking a broader semantic view now". We don't know where we might run into problems in future, and we might choose a generalisation now that turned out not to fit the shape of needs which actually rise in future, or which is never needed. Hence I don't think we should do it, even if it's tempting. It's better to wait until we do have quite a few similar specific things for which we can see that an appropriate generalisation is needed - that is what happened with area_type for example. On your other point, I would say that ancillary variables are data variables in their own right. It's quite possible that an ancillary variable might be the only variable stored in a file, and the file might become detached from the rest of the dataset, so an ancillary variable ought to describe itself as far as it can. I appreciate that isn't likely to happen for a scalar variable, but it is true of ancillary variables in general.

With respectful best wishes

Jonathan

kbailey-noaa commented 4 years ago

@roy-lowry Sorry, was on leave last week. Re: our side conversation, I don't think we preferred this standard name be as narrow as possible. I wanted to avoid being too specific (e.g. avoid an epoch name like tidal_sea_surface_height_above_mean_higher_high_water_1960_to_1978).
From my perspective, making this epoch name more generic won't adversely impact us / our water level use case. We just need a way to convey the epoch year range associated with the water level standard name, and reference_datum_epoch still satisfies this.
Our proposed water level standard name variables for MLLW / MHHW in issue cf-convention/vocabularies#74 includes the description of the tidal datum epoch / how it's applied, so I don't feel a need to specify that again in this epoch standard name definition. Therefore, no issues with reference_datum_epoch.
Caveat - I only speak for the water level use case, so can't lend my opinion on risks of generic vs specific.

But in cf-convention/vocabularies#74 we should probably add the sentence to both MLLW and MHHW descriptions: "The tidal datum epoch should be provided in a variable with standard name reference_datum_epoch." (or whatever epoch standard name is eventually agreed upon).

kbailey-noaa commented 4 years ago

@roy-lowry et al This seems to have stalled.. Any resolution on this?

roy-lowry commented 4 years ago

@kbailey-noaa Waiting for moderators to suggest a way forward.

feggleton commented 4 years ago

Hi all,

Thanks for the continued discussion and effort on this issue! It seems we are a bit stuck! So far I think I have counted up 3 people for the more generic name and 3 people for the more specific name.

Generic name: Term: reference_datum_epoch Description: A specific time period over which observations are taken and reduced to obtain mean values (e.g., for tides, mean lower low water, mean sea level) to provide reference levels (datums) for subsequent measurements. The standard name is used for an ancillary scalar time variable that may be linked to any variable referenced to a datum. It is suggested that the scalar variable be set to the epoch midpoint. Bounds to specify the start and end of the epoch are mandatory. It is recommended that a plain-language description of the epoch (e.g. 1960-1978) be included in the long_name attribute. Units: Seconds

Specific name:

Term: tidal_datum_epoch Description: A specific time period over which tide observations are taken and reduced to obtain mean values (e.g., mean lower low water, mean sea level) to provide reference levels (tidal datums) for subsequent water level measurements. The standard name is used for an ancillary scalar time variable that may be linked to any sea level variable referenced to a tidal datum. It is suggested that the scalar variable be set to the epoch midpoint. Bounds to specify the start and end of the epoch are mandatory. It is recommended that a plain-language description of the epoch (e.g. 1960-1978) be included in the long_name attribute. Units: Seconds

There are some strong arguments here and a couple of people on the fence. This is a tricky one.

After discussing with someone in the climate modelling community, I personally would say we go for the specific term. It doesn't look like atmosphere/climate people need this and there haven't been any requests. I have just been dealing with an anomaly dataset and everyone seems to do it differently - some want it in the filename, some in the 'extra' metadata and some in the documentation. But considering that anomaly data has been around for a while and no one has complained about it then maybe it's not a problem?

That's just my opinion but any other comments are welcome and then maybe we can come to a consensus soon.

larsbarring commented 4 years ago

I am coming late into this conversation, but let me add some views from a non-oceanographic perspective.

As I understand it we want to describe a specific period in time, epoch, used for some purpose. In this context of sea-levels the period in time is used to define a specific sea-level reference level (datum). Going for the more generic term reference_datum_epoch or the more specific specific term tidal_datum_epoch is not going to change anything of the entity we aim to describe. Nor would a [much] more general standard name, like reference_epoch. They all specify a period in time between a start point and an end point, and its general use as a "reference" for something. What changes however, is the "something", i.e. for what purposes this standard name can be used, which of course is specified in the Description. This is a different situation from the example temperature @jonathan mentions above: all the different separate standard names related to temperature do in fact change, or specify, the entity/phenomenon they describe.

Let me take a completely different perspective here: assume that I am interested in looking at the climatic evolution over time from a multi-variable perspective. SO, I am interested in all anomalies where the reference epoch is say 1901-30, 1931-60, ... It would then be easier to have one standard name for all such anomalies, rather than one for sea-level, a different for air_temperature, and so on.

Now, having advocated for a completely general standard name, like reference_epoch, I freely admit that I might be not aware of some specific reasons for having a targeted standard name for sea-level reference epochs. But, on the other hand, the reference epoch for sea-level I imagine has little "life" in itself except for being an essential metadata component for describing the sea-level reference datum, which on the other hand I very well imagine might have a a "life" in it own.

davidhassell commented 4 years ago

Hello,

@DanHollis's point that the more general use case already exists for the analysis of climate change - across very many variables - is persuasive, so I might favour the general solution (but I feel that I haven't yet fully assimilated all of the subtleties that have been discussed).

In any case, I have some points on either of the proposed descriptions. Using the general description as an example (but the same points apply to sea-level specific case), I might change it to as follows:

Description: A specific time period over which observations are taken and reduced to obtain mean values (e.g., for tides, mean lower low water, mean sea level) to provide reference levels (datums) for subsequent measurements. ~The standard name is used for an ancillary scalar time variable that may be linked to any variable referenced to a datum. It is suggested that the scalar variable be set to the epoch midpoint. Bounds to specify the start and end of the epoch are mandatory. It is recommended that a plain-language description of the epoch (e.g. 1960-1978) be included in the long_name attribute.~ To specify datums for a data variable, provide an auxiliary coordinate variable with this standard name. In this case, the end points of the time period may be defined with the coordinate bounds. If the variable has no bounds, a plain-language description of the epoch (e.g. 1960-1978) may be included in the long_name attribute, if appropriate.

for the following reasons (which may verge on the assumption of generality, I admit!):

Thanks, David

roy-lowry commented 4 years ago

I think @davidhassell has effectively demolished this proposal. Does anybody disagree that it should be closed?

DanHollis commented 4 years ago

@roy-lowry Are you saying that a new issue needs to be opened in order to propose the generic variable (given that @kbailey-noaa still requires a solution)? Or am I misunderstanding?

roy-lowry commented 4 years ago

No, I'm saying that the way of encoding the information - whether for the generic or specific case - seems to be invalid in CF unless I'm misunderstanding what David is saying. I knew that it was risky as it was setting precedents and my knowledge of CF has its gaps. So, either somebody with better knowledge of CF encoding details than me needs to rewrite the proposal or it should be closed.

davidhassell commented 4 years ago

Oh - my intention was not to demolish! I had hoped that my suggested text in bold might provide a solution for the technical problems that I had found.

roy-lowry commented 4 years ago

Thanks @davidhassell I had read your reasons without fully realising that you had also provided a solution - the end of your message so caught my eye so I missed the beginning! So, I take it that you are another vote for the generic solution.

Now I've understood I'm almost comfortable with your suggested change. Just a small amendment (in bold) as datum specification is generally understood to be in in the dimension of space (e.g. mean sea level) not time. The whole idea of this Standard Name is to document the time period over which data were averaged to establish the datum. You seem to have the datum and the epoch a little mixed up.

Description: A specific time period over which observations are taken and reduced to obtain mean values (e.g., for tides, mean lower low water, mean sea level) to provide reference levels (datums) for subsequent measurements. To specify the time period over which the datum for a data variable was determined, provide an auxiliary coordinate variable with this standard name. In this case, the end points of the time period may be defined with the coordinate bounds. If the variable has no bounds, a plain-language description of the epoch (e.g. 1960-1978) may be included in the long_name attribute, if appropriate.

I'd also appreciate clarification on what you think should be stored in the auxiliary variable to which the bounds are linked, which I think needs to be explained in the description.

davidhassell commented 4 years ago

Hi @roy-lowry

You seem to have the datum and the epoch a little mixed up.

Yes, I think so, a bit!

Because ancillary variables share the data variable's domain, I think it may be OK to also state that the datum should be provided as an ancillary variable:

Description: A specific time period over which observations are taken and reduced to obtain mean values (e.g., for tides, mean lower low water, mean sea level) to provide reference levels (datums) for subsequent measurements. When a datum is provided as an ancillary variable, to specify the time period over which it was ~the datum for a data variable was~ determined, provide an auxiliary coordinate variable with this standard name. In this case, the end points of the time period may be defined with the coordinate bounds. If the variable has no bounds, a plain-language description of the epoch (e.g. 1960-1978) may be included in the long_name attribute, if appropriate.

How does this CDL look?

dimensions:
  lat = 180 ;
  lon = 360 ;
  time = 1 ;
variables:
  double data(lat, lon, time):
    data:ancillary_variables = "datum" ;
    data.coordinates = "epoch" ;
    data:long_name = "tidal_sea_surface_height_above_mean_lower_low_water" ;
    data:units = "m" ;
  double datum(lat, lon) ;
    datum:long_name = "The datum" ;  // Is there a standard name for this?
    datum.units = "m" ;
  double time(time) ;
    units = "days since 2070-12-01" ;
  double epoch(time) ;
    epoch:standard_name = "reference_datum_epoch" ;
    epoch:units = days since 1960-12-01" ;
roy-lowry commented 4 years ago

@davidhassell Hi again. Not sure we're quite on the same page.

The datum(zero value) used for tidal_sea_surface_height_above_mean_lower_low_water measurements is re-assessed regularly by averaging the value of lower low water (relative to some geographic benchmark) over some time interval. In the initial use case this was 19 years, but I've known others (five years seems quite popular). The problem is that as sea levels change - currently rise - the meaning of 'mean_lower_low_water' changes depending upon when the measurements used in the average were taken. To make life more complex, some labs report data relative to the averaging interval (epoch) within which the data were collected whilst others apply a correction to convert historic data to the current epoch.

What this ticket is trying to provide is a way of semantically extending the standard names with an explicit epoch label, thereby avoiding a whole raft of standard names along the lines of:

tidal_sea_surface_height_above_1960_to_1979_mean_lower_low_water tidal_sea_surface_height_above_1979_to_1988_mean_lower_low_water tidal_sea_surface_height_above_1960_to_1965_mean_lower_low_water tidal_sea_surface_height_above_1965_to_1970_mean_lower_low_water tidal_sea_surface_height_above_1970_to_1975_mean_lower_low_water

@JonathanGregory suggested that this semantic extension could be done through an epoch variable with bounds which was my starting point. I readily confess that this is an area of CF where I have very little experience so guidance more then welcome.

I find your CDL difficult to map to the observational sea level data with which I'm familiar. There are two significant differences:

1) Sea level data are point time series, not spatial grids. 2) I have never seen a datum variable in sea level data as the datum value is always assumed to be zero. Suggesting the introduction of one just to allow the epoch to be labelled might not be well received!

I also don't see how the essential information for the semantic extension (the beginning and the end of the epoch) can be encoded into the data associated with that CDL.

Any chance you could help my understanding by modify your CDL on the basis of the above comments?

DanHollis commented 4 years ago

Looking at this from the perspective of climate observations (e.g. air temperature) I would identify the following quantities / pieces of information:

An actual measurement of air temperature e.g. 21.7 deg C (A) The climate normal for that location e.g. 20.3 deg C (B) The anomaly value i.e. +1.4 deg C (A minus B) The period covered by the climate normal e.g. 1981-2010

This can be applied to either points (e.g. observing stations) or grids.

I would see the climate normal as corresponding to the 'datum' and the period covered by the climate normal as corresponding to the 'epoch'. A dataset of anomaly values could include the climate normals (datum) as an ancillary variable but more likely this would be in a separate dataset (in my experience). However, to be useful, a dataset of anomaly values should always include some information about the period covered by the climate normals i.e. the minimum requirement is simply to be able to attach the epoch to the anomaly data.

This thread started out as an offshoot from cf-convention/vocabularies#74 which requests variables such as:

tidal_sea_surface_height_above_mean_lower_low_water

By analogy, I interpret tidal_sea_surface_height as the observation, mean_lower_low_water as the datum and tidal_sea_surface_height_above_mean_lower_low_water as an anomaly value. The aim of this thread was (I thought) to find a way to attach the epoch to these anomaly data.

If it were necessary to store the datum as well (either as ancillary data in the same dataset or in a separate dataset) then it implies the need for additional standard names e.g. mean_lower_low_water. Given that much of the justification for recording the epoch stems from the fact that the datum varies with the epoch, it's interesting that such variables do not already exist and are not being requested. In fact I was surprised to see that even mean_sea_level does not appear to exist as a standard name. Perhaps there is something I'm not understanding about sea level data...

davidhassell commented 4 years ago

OK - thanks for the explanations, @roy-lowry and @DanHollis.

If the values of the datum are not important here, then we just include the epoch auxiliary coordinate variable, with its bounds. Its standard name tells that it does not contain "normal" time coordinates (just like "forecast_reference_time" coordinates are not "normal").

Is this CDL clearer, I wonder?

dimensions:
  obs = 1000 ;
  time = 1 ;
  bounds = 2
variables:
  double data(obs, time):
    data.coordinates = "lat lon epoch" ;
    data:standard_name = "tidal_sea_surface_height_above_mean_lower_low_water" ;
    data:units = "m" ;
  double time(time) ;
    units = "days since 2019-12-01" ;
  double epoch(time) ;                                   // data: 1970
    epoch:bounds = "epoch_bounds" ; 
    epoch:standard_name = "reference_datum_epoch" ;
    epoch:units = days since 1960-12-01" ;
  double epoch_bounds(time, bounds) ;                    // data: 1960, 1979
  double lat(obs) ;
    lat:units = "degrees_north" ;
  double lon(obs) ;
    lon:units = "degrees_east" ;

If you do want to record the actual values of the datum (with some as-yet-undefined appropriate standard names), then an ancillary variable still seems like a reasonable way forward, to me, or else just include it as another data variable in the file and leave it to the user to connect them, like one would currently connect components of wind vectors.

ethanrd commented 4 years ago

Wouldn't it make sense for information about the vertical datum to be with the rest of the coordinate reference system information? Though the CF section on CRS is called "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" it does mention vertical datums:

The grid_mapping variable may identify datums (such as the reference ellipsoid, the geoid or the prime meridian) for horizontal or vertical coordinates.

davidhassell commented 4 years ago

Hi @ethanrd,

That is an interesting idea. I think that we need to clarify if the particular tidal datum being discussed here is

I am a unsure about what the answer is.

The tidal datum feels to me like ancillary data ("When one data variable provides metadata about the individual values of another data variable" https://cfconventions.org/cf-conventions/cf-conventions.html#ancillary-data), i.e. it is a datum of the data, rather that a datum of the CRS.

But I've also been suggesting storing the date information of the datum as part of the domain (as an auxiliary coordinate variable).

The datum of a domain CRS in the date model is restricted to "A definition of a datum specifying the zeroes of the dimension and auxiliary coordinate constructs which define the coordinate system" (https://cfconventions.org/cf-conventions/cf-conventions.html#appendix-CF-data-model), which isn't really what we have here - neither the datum values, nor the dates it is defined for, provide the zero of a coordinate variable.

?

roy-lowry commented 4 years ago

@ethanrd I think what you're saying would make sense for describing what the datum is (e.g. vertical offset from something like the Geoid), but what we're trying to record here is information on how the datum was determined.

roy-lowry commented 4 years ago

@davidhassell A simple, possibly stupid, question. Why is the time dimension in your example set to 1 and not to something like 8760?

I'm used to UK NTSLF sea level data, which has a year's data from a single tide gauge in each NetCDF file, not data from multiple tide gauges for a single time step.

davidhassell commented 4 years ago

@roy-lowry Good question. The answer is because I got it wrong! How about this (which covers multiple gauges for multiple timesteps):

dimensions:
  obs = 1000 ;
  time = 8760 ;
  bounds = 2
variables:
  double data(obs, time):
    data.coordinates = "lat lon epoch" ;
    data:standard_name = "tidal_sea_surface_height_above_mean_lower_low_water" ;
    data:units = "m" ;
  double time(time) ;
    units = "days since 2019-12-01" ;
  double epoch ;                                    // data: 1970
    epoch:bounds = "epoch_bounds" ; 
    epoch:standard_name = "reference_datum_epoch" ;
    epoch:units = days since 1960-12-01" ;
  double epoch_bounds(bounds) ;                    // data: 1960, 1979
  double lat(obs) ;
    lat:units = "degrees_north" ;
  double lon(obs) ;
    lon:units = "degrees_east" ;
roy-lowry commented 4 years ago

Excellent. That works for me. Now we've lost the datum, what's your latest suggestion for the Standard Name description?

JonathanGregory commented 4 years ago

@ethanrd's remark suggests to me that the word datum alone is liable to cause confusion. I think it's correct to regard the tidal datum as metadata about the (sea level etc.) data, rather than metadata about the domain, as @davidhassell says. The tidal datum is not a geoid or geopotential datum, as used in geodesy, for which we have attributes geopotential_datum_name and geoid_name of the grid mapping variable (in Appendix F). Hence I think the standard name should be epoch_of_reference_tidal_datum (including tidal to avoid the ambiguity).

davidhassell commented 4 years ago

Hi @roy-lowry et al.

Here is my suggestion for epoch_of_reference_tidal_datum:

- Term epoch_of_reference_tidal_datum - Description The time (epoch) of a reference tidal datum (e.g. for tides, mean lower low water, mean sea level). A reference tidal datum provides reference levels for subsequent observed or simulated measurements. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the time period over which the datum was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that time. - Units Seconds

The suggested standard name tidal_sea_surface_height_above_mean_higher_high_water could have added to its description

_"The time (epoch) of the mean higher high water reference tidal datum should be provided with a coordinate, scalar coordinate, or auxiliary coordinate variable with the standard name epoch_of_reference_tidaldatum."

The fully general version, that could also be used for, say, air temperature anomalies, would be epoch_of_reference_datum would simply remove references to tides:

- Term epoch_of_reference_datum - Description The time (epoch) of a reference datum. A reference datum provides reference levels for subsequent observed or simulated measurements. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the time period over which the datum was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that time. - Units Seconds

For the general use case I think that it would be sufficient to state (somewhere in the conventions - a new sub-section in chapter 3, perhaps) that when there is an "epoch_reference_time" coordinate variable, the reference datum values have the same physical nature as the data variable, unless indicated otherwise by the data variable's standard name? This would allow us to more meaningfully store anomalies for any existing or conceivable standard_name/cell_method combination.

It seems to me that the "tidal" in "epoch_of_reference_tidal_datum" doesn't improve our understanding of the data over the general case (happy to be proved wrong!). Given that, I still support the non-tidal-specific version.

Thanks, David

ethanrd commented 4 years ago

I think what you're saying would make sense for describing what the datum is (e.g. vertical offset from something like the Geoid), but what we're trying to record here is information on how the datum was determined.

@roy-lowry Isn't the epoch needed to fully define the datum? Can a user compare two datasets that use a mean lower low water datum if those datasets don't specify the epoch over which the datum was determined?

roy-lowry commented 4 years ago

@ethanrd Probably not expressing myself very clearly here. My understanding of a CRS is that it includes a spatial offset from a reference ellipsoid. Whilst it is possible to describe mean lower low water for an epoch in this way, it's my understanding that it isn't done like that in most sea level data sets. If it were, then as you say the epoch duration could be recorded within the CRS description.

Instead, the only description of the datum we have is a text phrase within the Standard Name, which doesn't mention the epoch. You are absolutely correct in saying that two MLLW data sets cannot be reliably compared without knowledge of the epoch, which in my experience is usually recorded in some sort of plain language metadata. BODC practice when I retired was to deliver two files - data in NetCDF and metadata in XHTML, which isn't ideal for machine readability! What I'm trying to do here is to get machine-readable epoch information into the data files in a way that is likely to get used.

roy-lowry commented 4 years ago

@davidhassell Thanks David. I prefer and am comfortable with your second suggestion. I've just made a couple of minor changes to exclude defined terms from their definitions, thus.

JonathanGregory commented 4 years ago

Dear @davidhassell and @roy-lowry

If there is a consensus I will accept it; if there isn't yet I will still argue for the more specific epoch_of_tidal_reference_datum. One reason is the one I gave before, that "datum" in geodesy and WKT is not the same thing as a tidal datum. As Roy says, a tidal datum could be defined in terms of a vertical distance at the station from some geopotential datum, but I think it usually means the long-term-mean observed tidal level (of a particular kind) at the station, or some level defined wrt a physical tidal benchmark at the station. Using the word "datum" in a generic sense would be ambiguous I think; they're connected but distinct notions. A second reason is that "datum" is not appropriate for the other motivation that has been given for a generic standard name, namely to identify a climatology from which anomalies are calculated. "Datum" is not a word which is used in that sense, as far as I know, so it wouldn't be an obvious choice for that purpose, or self-explanatory.

Best wishes

Jonathan

davidhassell commented 4 years ago

Dear @JonathanGregory,

I am persuaded by your arguments on the word "datum", so I'd be happy for something like (incorporating Roy's suggestion):

- Term epoch_of_tidal_reference_datum - Description The time (epoch) of a tidal reference datum (e.g. for tides, mean lower low water, mean sea level). A tidal reference datum provides the zero value for subsequent observed or simulated measurements. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the time period over which the tidal reference datum was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that time. - Units Seconds

A more general name that could be used for, say, climate change anomalies, would then have a different name (epoch_of_??), but would follow a similar definition pattern. This, then, should be pursued in its own issue. Is that OK @DanHollis, et al.?

roy-lowry commented 4 years ago

Dear @JonathanGregory I think this is a case where we will have to agree to differ and follow the majority (if not totally consensus) view.

One clarification - I would say that sea-level zero values are only sometimes based on long-term averages. For example, the UK National Tide Gauge Network uses Ordnance Datum Newlyn (which I think is a peg driven into a rock in Cornwall) and Dutch sea level data use NAP (a physical marker in Amsterdam), both of which can be referenced to a geopotential datum. However, as anybody working on sea level data integration into global sea level products knows these relationships can change over time due to earth tides or more dramatically earthquakes. There have been times when I have thought that I understood sea level datums and their issues until I started discussing them with the specialists like my former colleagues in Permanent Service for Mean Sea Level (PSMSL). My understanding then dissolved into a sea of uncertainty!

JonathanGregory commented 4 years ago

Dear @roy-lowry

I barely understand it myself. I've found "datum" a confusing thing for a long time, which is the reason for my caution now. As you say, "datum" sometimes means a piece of metal or a scratch on a rock in a particular place. That is a quite different (more concrete, less abstract!) thing from a geopotential surface, also called a "datum", and both of these crop up in connection with sea level. They can be linked, but the relationship depends on time, models and conventions, as you say.

Jonathan

DanHollis commented 4 years ago

I agree that ‘datum’ is not a word commonly (if ever) used in the context of climate normals.

It’s also true that I am no expert on tidal data, so there may be some subtleties that are escaping me.

However, my feeling is that we are trying to capture something pretty simple. I might describe it as follows:

“the start and end of a period of time (typically in excess of 10 years) over which some observations have been summarised (usually by averaging) in order to provide a baseline against which other observations may be compared”

For climate normals it (the start and end of the epoch) could be captured by two numbers (e.g. 1961 and 1990) or a string (e.g. “1961-1990”). Within the context of CF I can see that a time coordinate with bounds is a better (more flexible) solution.

From the discussion in cf-convention/vocabularies#74, it seemed to me that tidal data worked in a similar way i.e. there is an epoch (e.g. 1960-1978 or 1983-2001) over which a quantity (e.g. lower low water) has been summarised (e.g. meaned) to provide a baseline for other observations (e.g. tidal sea surface height).

Rather than include the epoch within the tide-related standard name (or its description), it seemed to make sense to factor this out and give it its own standard name. Given the concept (a reference epoch) seemed to be applicable to both climate normals and tidal data, I argued in favour of a generic solution.

So, how about “reference_epoch” for the standard name?

roy-lowry commented 4 years ago

@DanHollis I'd lose the 'typically in excess of 10 years' - I've seen 5 years quite widely used. I also feel uncomfortable with the value stored being described as the start and end, when these are in fact the bounds. I also liked David's words on the storage technicalities.

So, how about?

Term reference_epoch Description The period time over which observations have been summarised (usually by averaging) in order to provide a baseline against which other observations may be compared. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the time period over which the datum was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that time. Units Seconds

ethanrd commented 4 years ago

@JonathanGregory - I’m confused by your “datum” argument. While the tidal datums under discussion are calculated from long-term averages, they are still defining a zero level just like a reference ellipsoid or a geoid. And, as you say, they can be (are?) related to other vertical datums through survey benchmarks or GPS. So, while they are calculated differently, they still play the same role.

@roy-lowry - I too am likely not expressing myself clearly. Currently CF can only define/identify a vertical datum with reference_ellipsoid_name, geoid_name, and geopotential_datum_name. That is however a limitation of CF not a characteristic of CRSs.

My suggestion was that we look at extending CF CRS (grid mapping) capabilities to better support tidal datums. Then have all _[above|below]_mean_lower_low_water (and the like) standard name definitions say that the tidal reference epoch should be specified in a grid_mapping variable. This would follow the pattern of all the _[above|below]_reference_ellipsoid and _[above|below]_geoid standard names whose definitions say that the reference ellipsoid or geoid should be specified in a grid_mapping variable.

larsbarring commented 4 years ago

@roy-lowry, @DanHollis I like this this general approach for defining a standard name for reference epoch. I agree that there is no need to specify, or suggest, a reasonable duration for the epoch in the definition; it may vary according to context and community standards. In the same vein, I see no reason to limit the scope of the standard name to observations only. The same concept may well be applied to model data:

Term reference_epoch Description The period time over which a time-series of data have been summarised (usually by averaging) in order to provide a baseline against which other data, or the same data but a different time period may be compared. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the time period over which the datum was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that time. Units Seconds

Probably the italicized terms need to be improved, but I hope they convey the gist of what I aiming at.

roy-lowry commented 4 years ago

@ethanrd Thanks Ethan. Good idea but that's something beyond my capabilities to take forward - it really needs a specialist in sea level geodesy. Can I suggest that as this thread is coming close to a conclusion that we let it run and see if somebody else is prepared to take the lead for the extension to CF you propose?

roy-lowry commented 4 years ago

@larsbarring Is this the improvement you were looking for? I think it retains your basic point.

Term reference_epoch Description The period of time over which repeated measurements of a parameter have been summarised (usually by averaging) in order to provide a baseline against which other measurements of that or other parameters may be compared. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the beginning and end of the time period over which the data aggregation was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that. Units Seconds

larsbarring commented 4 years ago

@roy-lowry Thanks -- I think that we are almost there. To my mind the term measurements precludes model data, if this is not the case please say so. Anyway, how about this:

Term reference_epoch Description The period of time over which a dataset representing a parameter have been summarised (usually by averaging) in order to provide a baseline against which other data representing the same, or other parameters may be compared. If a coordinate, scalar coordinate, or auxiliary coordinate variable with this standard name has bounds, then the bounds specify the beginning and end of the time period over which the data aggregation was determined. It is not the time for which the actual measurements are valid; the standard name of time should be used for that. Units Seconds

roy-lowry commented 4 years ago

@larsbarring Point taken.