Clarification of time coordinate requirements

martinjuckes commented 5 years ago

Clarification of time coordinate requirements Moderator: (not yet) Requirement Summary: Clarification on what can be done with forecast_period as a time coordinate.

Technical Proposal Summary: Either (1) the convention should accept forecast_period as a valid standard_name for a time coordinate, which requires modification of some statements about the units of time coordinates or (2) the convention should make it clear that a coordinate variable with standard_name set to forecast_period is not a time coordinate in the sense implied by section 4.4.

Benefits: Users encoding forecast data who wish to use a forecast_period variable as a coordinate variable.

Status Quo: If a coordinate variable has standard_name set to forecast_period, the CF checker interprets this as a time coordinate. The CF checker also insists that time coordinate variables should have units of the form <units> since <reference time>. This form of units is not valid for forecast_period, which is specifying an elapsed time and hence has no reference time. The consequence is that it is currently impossible to construct a netCDF file with a forecast_period variable used as a coordinate variable.

Detailed Proposal: My preference is option (1), which would accept that elapsed time can be a valid time coordinate. This would require modification of section 4.4 to explain different options depending on whether the time coordinate is (a) representing a date or (b) representing an elapsed time.

davidhassell commented 5 years ago

Hi Martin,

There is no reference to forecast_period in the conformance document - http://cfconventions.org/Data/cf-documents/requirements-recommendations/requirements-recommendations-1.7.html - so, perhaps the checker could be amended, rather than the conventions?

David

martinjuckes commented 5 years ago

Hi David, I think we first need to decide whether a coordinate variable with standard name forecast_period should be treated as a time coordinate or not, and then make the necessary adjustments based on that decision. To me it appears natural to consider forecast_period as a time coordinate: do you have any reason to disagree, other than wanting to avoid the inconvenience of modifying the text of the convention?

There is a related problem, in that it is also impossible to use radiation_frequency, of any standard name with units corresponding to frequency, as a coordinate variable. In this case I think it is clearly a matter of updating the checker or cfunits, and no convention modifications are required: the problem starts with cfunits identifying frequency units as time units (which in turn comes, I believe, from a udunits definition of compatibility of units).

For instance, the following CDL generates an invalid file,

dimensions:
    freq = 4 ;
variables:
    float freq(freq) ;
        freq:units = "s-1" ;
        freq:standard_name = "radiation_frequency" ;
    float tau(freq) ;
        tau:units = "1" ;
        tau:standard_name = "atmosphere_optical_thickness_due_to_cloud" ;

// global attributes:
        :Conventions = "CF-1.7" ;
data:
  freq = 1000., 2000., 3000., 4000. ;
  tau = .1, .2, .2, .1;

The error reported is that freq is a time coordinate and so must have units of the form .... since ....

This is, however, a side issue to the main topic I wanted to discuss here, which is the ambiguity regarding forecast_period.

davidhassell commented 5 years ago

Hi Martin,

Perhaps as I am not a consumer of such forecast data, I can't initially see the benefit in making a coordinate variable that contains time deltas a CF "time" coordinate variable, and the extra complication that that introduces. I would be great to see some use cases where this labelling provides benefit.

I think (?) that there are various climate indices that have units of time deltas (e.g. relating to growing season length ...) that we would not want to automatically call "time" coordinates. What about a coordinate variable containing latitude deltas - would we also want to make that an "X" variable - perhaps not?

With regards the cfunits, it does indeed think that X is equivalent to X-1, for any units X (e.g. metres, or seconds, or ...) and you are right in that this is purely inherited behaviour from the UDUNITS C library. However, I also think that this behaviour is wrong in a data analysis context (e.g. it allows cf-python to subtract data in units of "s" from data in units of "s-1") and I shall treat it like as a bug in cfunits.

Many thanks, David

martinjuckes commented 5 years ago

Hi David,

The specific use case that motivated me was the metadata standard used by the WCRP Climate system Historical Forecast Project (CHFP). They use forecast_period as the time coordinate .. an approach which facilitates a common family of analysis procedures for historical climate forecasts, which involve aggregating data at fixed forecast_period.

Re cfunits: thanks. This will affect CMIP6 data, so it would be good to have this fixed. Note that the bug also affects height, e.g. m-1 is considered as a unit of height.

In the case of time there is an additional ambiguity which can be illustrated by this question: is "day" a unit of time? Most people would say yes, I think, because the word "time" is usually associated with time intervals. In your comment above you are drawing a distinction between "time deltas" (for elapsed time) and "time", which you interpret as referring to a specific instance in time. Udunits is, I think, reflecting common usage when it considers both days and days since .... as time units. The sentence The Udunits routines utScan() and utIsTime() can be used to make this determination in Section 4.4 of the CF convention implies that CF is following the Udunits interpretation. Other parts of Section 4.4 could be taken to support your interpretation that "time" excludes "time deltas". Personally, I have always thought that "time" has the broader interpretation when explicit clarification. If we are using "time" in a sense which is significantly different from common usage, then surely this needs to be spelt out. It is clear that the standard name time has a narrower meaning, but several sentences in the convention need changing if we want every usage of the word "time" in the document to be consistent with this.

I agree that we need consistency with spatial coordinates. Again, I think a latitude delta used as a coordinate would qualify as a spatial coordinate, and if we want to restrict the use of, for example, axis="X" to coordinates which are part of a spatial reference system then we need to clarify the language.

regards, Martin

JimBiardCics commented 5 years ago

Satellite swath data presents another use case for a relative time coordinate. There is often an absolute time (or time relative to a distant epoch) associated with a given scan line, often referenced to the center pixel of the scan. (The scan line is approximately transverse to the direction of motion of the satellite.) Each pixel of the scan line can be given a time relative to that center time, and many satellites provide the data that way. While it is possible to convert all of those relative times to absolute times and store them in that fashion, it will likely take up quite a bit more space, and there is a good chance that users of the data will convert back to relative times as their first step after reading the data.

In this particular use case, the relative times correlate to a transverse axis to the direction of travel axis. You can think of them as the Y to the center scan time X.

All that to say, relative time and space coordinates can be quite useful. I can imagine that the concept of relative coordinates might be generally useful beyond the spatio-temporal domain. I think we should make room for them. It would also be necessary to connect those relative coordinates to absolute coordinates, so that you could indicate that the forecast times or sample times were relative to absolute time coordinate values and do similarly for other coordinates.

As a quick digression, I'm using the word "absolute" here to refer to a coordinate variable that is relative to a fixed reference or epoch. So frequency, time since an epoch, X distance from a coordinate system zero point, temperature in Celsius, and depth below mean sea level would all be considered absolute.

In order to connect a relative coordinate to an absolute coordinate, we could indicate the relationship by naming the reference variable in an attribute on the relative variable. The exact relationship between the coordinate variables and a data variable would be seen through the dimensions. The variables in the examples below are v - a data variable, a - an absolute coordinate variable, r - a relative coordinate variable, and a2 - an absolute coordinate variable derived from a and r.

The union of the named dimensions of an absolute coordinate variable and a relative coordinate variable must be a subset of (or the same as) the named dimensions of the data variable they relate to. General examples:

v[i,j,k,l,m], a[i,j,k], r[i,j,k], a2[i,j,k] = a[i,j,k] + r[i,j,k]
v[i,j,k,l,m], a[i,j,k], r[j,k,l], a2[i,j,k,l] = a[i,j,k] + r[j,k,l]
v[i,j,k,l,m], a[i,j,k], r[l,m], a2[i,j,k,l,m] = a[i,j,k] + r[l,m]

Simple examples:

v[t,u,v], a[t], r[t], a2[t] = a[t] + r[t]
v[t,u,v], a[t], r[t,u], a2[t,u] = a[t] + r[t,u]
v[t,u,v], a[t], r[u], a2[t,u] = a[t] + r[u]

If we like, we could preclude the use of scalar coordinate variables.

Any thoughts on such an addition?

martinjuckes commented 5 years ago

Thanks @JimBiardCics , that is very useful. I have come across this before in a discussion about swath data .. people struggling to find a CF encoding data which they usually analyse in terms of a scan start time and a scan offset time. As you say, collapsing it to a single dimension is possible, but there are good reasons why that is not common practice in the community. To support it, I think we would need a new standard name for the relative time: do you agree?

@davidhassell remarked about dealing with a coordinate variable containing latitude offsets (deltas). I replied that we would probably want to have consistency in the way we treated time, space and other coordinates: I have since realised that we have some quite complex baggage in this area which introduces some differences in approach that we may have to live with. In particular, for longitude the data values in the array are usually interpreted as absolute (as defined by @JimBiardCics above), but, since the introduction of the longitude_of_prime_meridian attribute in CF-1.2 (2008), can also be relative. The longitude construct can still be thought of representing absolute information which is obtained by combining the values of the longitude array with the longitude_of_prime_meridian.

Although there are differences, the underlying concept is similar to that described by @JimBiardCics : if a time offset is used as a coordinate it is common practice to provide the information needed to construct the absolute time in another variable, which may have different dimensions to those of the time offset variable.

In the weather forecast community it is the norm to distinguish between a "validation time", which is intended to represent the time at which the corresponding data is (most) valid, and the "forecast reference time", which is also an absolute time, but corresponds to the start of the forecast rather. Because of long usage, validation time in CF is represented by standard_name = "time", and the difference between this and the start of the forecast has been assigned a name forecast_period.

For most variables, the question as to whether it is absolute or relative has no relevance to their use as coordinate variables: the only requirements, I believe are monotonicity and absence of missing data. Is there a good reason for making spatial and temporal coordinate different? We clearly need some guidance on how to present absolute time (i.e. chronological time) accurately, but do we need rules restricting the use of time offsets?

davidhassell commented 5 years ago

@JimBiardCics In your scan line example, how would that be encoded? Is the time of the centre pixel stored as a size-1 coordinate variable and the offsets stored in a size-N coordinate variable? Are the two connected by just convention?

Whilst we can clearly have a forecast_period coordinate variable ("dimension coordinate construct" in data model terms), it surely only makes sense if there is also a forecast_reference_time dimension coordinate construct, too. If the former existed on its own then I'm struggling with making "elapsed time" a definitive time coordinate, as I'm left asking 'elapsed time since when?'.

I'm wondering if dimension coordinates constructs can, or should, be characterised as belonging to a coordinate reference system, and therefore have to have datum for their values, either explicitly or implicitly defined. In the case of longitude what we have been calling "absolute" values are in fact "relative to a datum of 0".

It seems to me that forecast_period is of little use if you don't know the forecast_reference_time, if you had both then surely the

JimBiardCics commented 5 years ago

@martinjuckes I think a unified approach (to the degree possible) is the best plan. Even though I used them in my previous comment the terms relative and absolute may not be the best to use, since so many measurements are values relative to some reference point or other. I think the differentiator here is whether or not the reference is (to first order) static or in some fashion unchanging. As it stands, CF doesn't deal with static reference points in a consistent fashion. The reference point is implied by the units in some cases (Celsius and Fahrenheit, for example), stated within the units string when it is for time, stated in a grid_mapping variable attribute when it is for a horizontal spatial coordinate (as you mentioned), and stated in the standard name in yet others (e.g. height_above_mean_sea_level). There are cases where the standard name definition calls for a coordinate variable containing the reference value or values associated with the data (such as for cloud_binary_mask), and there are probably others that I'm not aware of.

We can continue to deal with this on a case-by-case basis through standard name definitions, or we could handle some of the cases by recognizing a class of "subordinate coordinates" or "offset coordinates" (or whatever name you prefer) and come up with rules for them. This would provide a mechanism for cases such as forecast offsets and scan times where there is a changing reference point. These are both time-based cases, but similar situations arise with cases such as changing orientation angles of an instrument relative to a platform which itself has changing orientations angles. This particular relationship is generally thought of as implied, but we could provide a way to make such relationships clear.

martinjuckes commented 5 years ago

Can this be covered by formula_terms? In the case where I have, for instance, a coordinate variable forecast_period and an auxiliary variable forecast_reference_time, then the formula_terms construction should make it possible to specify how the information needed to construct time is provided:

float data(fp) ;
      data: coordinates = "frt" ;
float fp(fp):
      fp: standard_name = "forecast_period";
      fp:units = "s" ;
      fp:formula_terms = "reftime: frt offset: fp" ;
      fp:computed_standard_name = "time";
float frt:
      frt:standard_name = "forecast_reference_time";
      frt:units = "days since 2000:01:01"

This would still require some case-by-case specifications, since the formula associated with any formula_terms statement is only defined in the definition of the associated standard name. It would, however, make use of an existing construct. The formula is trivial in this case (time = reftime + offset): it could be argued that it is unnecessary and it would be enough to provide the forecast_period and forecast_reference_time within the coordinates + auxiliary coordinates, but I thing it would provide greater clarity if the formula_terms construct was also used.

@davidhassell : I can understand that you might want to know the reference point for time specified through a coordinate variable that is an elapsed time, but the question is whether we want to have specific rule for time in this regard, and if so, what is it? As @JimBiardCics has pointed out, we have a huge variety of terms in CF and in general it is going to up to the user whether they provide a reference value or not. I feel it would be enough to recommend that forecast_period be used together with forecast_reference_time so that there is an implied time. We could make the link to the implied time clearer with the formula_terms construction above (which would require adding a specification of the associated formula in the relevant section of the convention).

JimBiardCics commented 5 years ago

@davidhassell In the scan line example, the reference time for each scan (might be the center time) is stored in a regular time variable as an "absolute" time. Let's call this variable scan_time. There are two use cases, one where the sample times for each scan are variable, and one where they are considered fixed.

In the fixed case the sample_time variable is one-dimensional, with the same dimension as the transverse coordinate dimension (whatever that is) of the data variable. This corresponds to simple example 3 in my earlier comment. The time values in the sample_time variable are relative to each value of the scan_time variable in turn.

In the variable case the sample time variable is two-dimensional. The first dimension is the same as the scan_time dimension and the second dimension is the same as the transverse coordinate dimension of the data variable. This is simple example 2.

In both cases the sample_time coordinate variable is a valid coordinate for the data variable, but there is a relationship between the scan_time coordinate and the sample_time coordinate that ought to be captured. Furthermore, the scan_time coordinate cannot, according to CF Section 4.4, be considered a time variable.

JimBiardCics commented 5 years ago

@martinjuckes I tried that approach in another instance in the past (perhaps about this very thing?) and was told that formula_terms are only for use with parametric vertical coordinates.

martinjuckes commented 5 years ago

It is certainly true that formula_terms are currently only used for parametric vertical coordinates. It would be interesting to know if there was a deeper reason for this, beyond the fact that the use-cases considered so far are all associated with vertical coordinates. It might be useful to get some input from a few more people on this topic.

larsbarring commented 5 years ago

@martinjuckes Thanks for pointing me to this issue. As you know my main experience is with climate indices (aka derived statistics), and less so with forecast data.

As @davidhassell already mentioned in this thread there are climate indices, like the growing season length, which is a duration or "time delta" that do not fit (I think) to the current context as it is unique to each location and year. However, it is simply the difference between the end of growing season and beginning of growing season, which both are durations relative to some reference time. This reference time could either be common for all years (e.g. it could the same as for the time coordinate variable) where the unit would be days since ... ("absolute" in the sense of Jim's explanation), or it could be the beginning of each year (of the time coordinate) in which case the unit would be days ("relative" in Jim's sense). In this latter case I can see the connection to the current conversation. And both the beginning | end of growing season are of interest in their own right.

martinjuckes commented 5 years ago

There is also a potential use-case in CMIP climate simulations: the pre-industrial control simulations have an arbitrary model time which, in the CMIP5 archive, varies between 0000 and 2500 (which causes some confusion to users). In this case there is a decoupling between the model time, which progresses steadily, and the actual date which the simulation relates to, which is determined by the specified forcing (annually repeating, representative of a fixed year).

martinjuckes commented 5 years ago

There are standard name 45 terms which have units of time (e.g. age_of_sea_ice, sea_surface_wind_wave_mean_period), of which I believe only two are expressing and absolute time (time and forecast_reference_time).

When used with a data variable, the cf-checker accepts all these variables with units of days or days since 1950-01-01 00:00:00. The latter units do not make sense for most of the variables.

When used as a coordinate, the cf-checker only accepts these variables with absolute date units, which is generally nonsense. This effectively excludes, for entirely arbitrary and irrational reasons, 43 parameters from use as coordinates.

taylor13 commented 5 years ago

To be absolutely clear, are you suggesting that sea_surface_wind_wave_mean_period might be used as a coordinate (e.g., for a histogram showing the distribution of periods)? If so, then I agree that units of "day", "second", "hour" etc. should all be acceptable. I don't think CF forbids this, so this is a problem with cf-checker.

For the CMIP use case, are you suggesting that a scalar time dimension might be defined indicating an appropriate (approx.) date that applies to the forcing imposed in these experiments? That seems to me to be more a description of the experiment design as opposed to a property of a variable being written, but I suppose it is something to consider.

JimBiardCics commented 5 years ago

@taylor13 Part of the issue may have to do with the interaction of standard_name and units. CF does state that a coordinate with a standard_name value of 'time' must have a 'since ' clause in the units attribute. Some of the things that @martinjuckes is talking about are checker bugs rather than CF issues. Others of the issues being raised are more about CF.

davidhassell commented 5 years ago

For reference, the current situation is that the sole necessary condition for a coordinate variable being a "time coordinate variable" is the presence of units of the form '<x> since <y>'. If standard_name and axis attributes are also present then they must have appropriate values, but the units rule (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#time-coordinate).

Is the aim of this issue just stop the checker complaining about a (perfectly valid) coordinate variable with a standard_name of forecast_period and units of days? If so then that is surely remedied by changing the checker, rather than the conventions ....

.... or is there need to have forecast_period coordinates denoted as "time" for, e.g. plotting purposes?

Thanks, David

martinjuckes commented 5 years ago

@davidhassell , unfortunately your first statement is currently untrue ... but if we can amend things that it becomes true that would solve the problem (option 2 in my initial post).

The objective is indeed to have a valid coordinate variable with a standard name of forecast_period. Since forecast_period is a measure of time, this would, of course, be referred to as a time coordinate by some people, but if it helps to say that it is not a "CF Time Axis", that is OK. As noted above, if this is the intended approach there is at least one false statement in section 4.4 that needs to be corrected. Personally, I think it is an unnecessary leap into obscurity to repeatedly use "time" if what we want to say is something more restrictive. It is a common word, and readers will expect it to have the common meaning. It would be better to change the title of the section to something like "CF Absolute Time Axis".

A NetCDF file generated following CDL is not currently passed as valid:

netcdf test {
dimensions:
    p = 144 ;
variables:
    float p(p) ;
        p:units = "s" ;
        p:standard_name = "sea_surface_wind_wave_mean_period" ;
    float data(p) ;

// global attributes:
        :Conventions = "CF-1.7" ;
        :Comment = "Demonstrating catch-22 preventing use of certain parameters as coordinates." ;
}

The error message, ERROR: (4.4): Invalid units and/or reference time, relates to the absence of a reference time in the units attribute of p. The cf-checker is satisfied if the units are modified by adding since 1850-01-01 00:00:00, but most users would of course be horrified by this fix.

The convention is not entirely explicit about what constitutes a "CF Time Axis" (though I can now see where Jim gets his interpretation from). It states, for instance, that:

A time coordinate is identifiable from its units string alone. The Udunits routines utScan() and utIsTime() can be used to make this determination.

Udunits does not require since ..... in a time unit string.

If, as @JimBiardCics says, it is intended that a "CF Time Axis" is identified by a unit string of the form <time units of measure> since <reference date>, then the problem is a bug in the CF checker. At the moment, the CF checker identifies anything with units = "s" as a time axis, and then complains if there is no reference time.

I find it unsatisfactory that a variable with standard name time can represent an elapsed time, but it can no longer be used in this way when it is a coordinate. Would it be better to introduce an elapsed_time standard name, and deprecate the use of time with relative time units, so that standard_name = "time" has a consistent meaning whether or not it is attached to a coordinate?

RosalynHatcher commented 5 years ago

Sorry coming to this a bit late..... @martinjuckes: Which version of the CF checker are you using please? I just ran the 2 code snippets above through version 3.1.1 and it doesn't identify fp or p as time coordinate variables.

I made a fix in February (https://github.com/cedadev/cf-checker/issues/49) to correct identification of time coordinate variables so that it only identifies a variable as a time coordinate if one or more of the following are true: 1) The axis attribute has the value 'T' 2) Has units of reference time 3) The standard_name attribute is one of 'time' or 'forecast_reference_time'

Cheers, Ros.

martinjuckes commented 5 years ago

Hi @RosalynHatcher : thanks .. I was using an older version, and can't at the moment get version 3.1.1 to work .. but I'll follow that up on the cf-checker list. Is it still the case that a data variable with standard name time can have units corresponding to either an elapsed time or a reference time?

@taylor13 , @JimBiardCics : do you agree with the interpretation of what constitutes a time axis given by Ros above? If so, can we update section 4.4 of the convention to say this? People outside CF often use a measure of elapsed time as a time axis: if we are excluding this, we need to be transparent about it. I can't see how this approach is going to lead to anything other than confusion.

martinjuckes commented 5 years ago

Hello All,

I'd like to add a rider to my comment above, concerning the units we use for time and the standard name canonical units. The CF Convention says that canonical units are: Representative units of the physical quantity. Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier We appear to have arrived at a situation in which days and days since 1900-01-01 12:00:00Z are considered as physically equivalent in some contexts, and not in others (both are acceptable units for time when used as a variable).

Can we treat the suggestion that days and days since 1900-01-01 12:00:00Z are "physically equivalent" as an oversight, and modify the text? e.g. would it be an acceptable clarification to say that the above definition of canonical units actually refers to the "units of measure", and that some standard names (only time and forecast_reference_time) have an additional option/requirement that the units string contain a reference time?

martinjuckes commented 5 years ago

Some related discussion on latitude and longitude in pull request 133. Also dealing with the fact that it may be more convenient to provide relative coordinates rather than absolute values.

JonathanGregory commented 8 months ago

From my reading, it appears that the discussion in this ticket arrived at conclusion (2) of @martinjuckes's original proposal: The convention should make it clear that a coordinate variable with standard_name set to forecast_period is not a time coordinate in the sense implied by section 4.4. That is, there's a lack of clarity in the convention.

To remedy this, I propose the following changes:

In Sect 3.3, the definition of canonical units for standard names is

Representative units of the physical quantity. Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier (see below and Appendix C, Standard Name Modifiers) or by the cell_methods attribute (see Section 7.3, "Cell Methods" and Appendix E, Cell Methods) or both.

I propose that we append a second paragraph to this definition:

Units of time coordinates (Section 4.4, "Time Coordinate"), whose units attribute includes the word since, are not physically equivalent to time units that do not include since in the units. To mark this distinction, the word since is included in the canonical units of quantities that are used for time coordinates. In both kinds of time units attribute (with or without since), any unit for measuring time can be used i.e. any unit which is physically equivalent to the SI base unit of time, namely the second.

We change the canonical unit from s to s since for the following standard names: forecast_reference_time, reference_epoch, time. We should add to the description of these standard names that they are to be used for time coordinate variables (Section 4.4). We should also consider introducing a new standard name e.g. elapsed_time, with canonical unit of s (no since). It hasn't been requested, but it's probably needed. The standard name time currently has no description, and it's possible it's been used for both time coordinates and elapsed time, but mostly the former e.g. in the CMIP data. My proposal distinguishes the two uses. Making these change would require a separate standard name issue, which I'll raise if this proposal is agreed. We can discuss the details then.
We modify the start of Sect 4.4, "Time coordinate", from

Variables representing reference time must always explicitly include the units attribute; there is no default value.

to

A time coordinate is an instant along the timeline of the real or model world. Variables containing time coordinates must always explicitly include the units attribute, with a unit of measure that is physically equivalent to the SI base unit of time, followed by the word since. There is no default value for the units.

I note that discussion 304 mentions various other clarifications that are needed in Sect 4.4, and I hope that the above won't be inconsistent with the issue arising from that discussion.

The above changes remedy a defect, rather than changing the intention of the convention, I believe. Nonetheless they're quite substantial, so it's safer to leave this labelled as enhancement.

What do you think, @martinjuckes, @davidhassell and other interested parties? (Jim Biard is unfortunately no longer among the CF community.)

Best wishes

Jonathan

larsbarring commented 8 months ago

Overall I think these suggestions are good. However a canonical unit s since is not compliant with the grammar of UDUNITS. I.e. the word "since" has to be followed by a Timestamp. Hence, I suggest

To change the first line of the suggested new paragraph in section 3.3 to read (bold indicates addition): """Units of time coordinates (Section 4.4, "Time Coordinate"), whose units attribute includes the word since followed by a timestamp, are not physically equivalent to time units that do not include since in the units""" Possibly we need to specify what we mean by "timestamp), or give an example?
For the three standard names forecast_reference_time, reference_epoch, time we should give a full canonical units, that is include a timestamp. The knack is to find a suitable timestamp and I think 1958-01-01T00:00:00.0 is a valid timestamp in all calendars, including an anticipated TAI calendar.
I think that the suggestion to add the standard name elapsed_time is good. In passing I note that there are three standard names involving elapsed time but including the phrase "time": time_of_maximum_flood_depth, time_when_flood_water_falls_below_threshold, time_when_flood_water_rises_above_threshold. Maybe these could be revisited at the same time.
Finally, regarding the suggested new start of section 4.4 I think that the phrase "... timeline of the real [...] world..." can easily be taken to be the UTC "timeline", but I do not think this is the intention?

davidhassell commented 7 months ago

Hello All,

I think that providing a full canonical units of s since 1958-01-01T00:00:00 would work.

I'm a bit uncomfortable with the creating an elapsed_time standard anme, because the existing standard names forecast_reference_time, reference_epoch, time are also elapsed times - times elapsed since the timestamp. Indeed, all times are elapsed since some reference point, and it's fine for that reference point to be defined with varying degrees of precision (none, some ("when the levee breaks"), lots ("1958-01-01")).

Perhaps we shouldn't pre-empt a use case, but instead when one arises create a new name for each time_since_<datum>_of_<something> that's needed. E.g. the existing time_of_maximum_flood_depth could (should?) be aliased as time_since_levee_break_of_maximum_flood_depth.

Thanks, David

ChrisBarker-NOAA commented 7 months ago

the existing standard names forecast_reference_time, reference_epoch, time are also elapsed times - times elapsed since the timestamp

Well, according to the discussion in https://github.com/orgs/cf-convention/discussions/304

Those are NOT elapsed times, but rather an encoding of timestamps.

I'm not sure there is total consensus on that -- but it's close, it's either an encoding of timestamps, or an encoding of particular points on the time continuum (usually both), but not an elapsed time in any case.

That is, the "since" part is critical.

e.g. units of "seconds" is an elapsed time (timedelta), and units of "seconds since a_timestamp" is a timestamp (datetime)

Not that this won't be very confusing to CF users ....

larsbarring commented 7 months ago

Yes, I fully agree with @ChrisBarker-NOAA.

@davidhassell : Compare the two standard names forecast_reference_time and forecast_period. Usually, the former is a timestamp (i.e. a DateTime), and the latter is an elapsed time. E.g. for each forecast_reference_time there are a set of forecast_periods (3h, 6h, 12h, ....). Of course, it is possible to switch context and focus on the forecast_valid_time (if there was such a standard name, which would have canonical units unit since Timestamp). And I think that this is what has been -- and is -- confusing to the CF. But this is a different context, as is indicated by the hypothetical standard name forecast_valid_time. In this context one would then use the same forecast period that now is the elapsed time back to when the forecast was issued.

davidhassell commented 7 months ago

Hi Chris and Lars,

Not that this won't be very confusing to CF users ....

As I've found out! I do follow and subscribe to the "timestamp" ideas of https://github.com/orgs/cf-convention/discussions/304, but have caught myself out of context and need to think about this some more! Thanks for your patience,

David

JonathanGregory commented 3 months ago

In April I proposed some changes to the convention to reflect the conclusions of previous discussion in this issue. Some further discussion followed by @larsbarring, @davidhassell and @ChrisBarker-NOAA. Here is the proposal again, revised on the basis of those comments.

In the description for canonical units in Sect 3.3 "Standard name", add a second paragraph:

Units of time coordinates (Section 4.4, "Time Coordinate"), whose units attribute includes the word since, are not physically equivalent to time units that do not include since in the units. To mark this distinction, the canonical unit given for quantities used for time coordinates is s since 1958-1-1. The choice of 1958 is arbitrary and not restrictive; the time coordinate variable's own units may contain any reference time and date (after since) that is valid in its calendar. In both kinds of time units attribute (with or without since), any unit for measuring time can be used i.e. any unit which is physically equivalent to the SI base unit of time, namely the second.

Change the canonical unit from s to s since 1958-1-1 for the following standard names: forecast_reference_time, reference_epoch, time. Add to the description of these standard names that they are to be used for time coordinate variables (Section 4.4).
Modify the start of Sect 4.4, "Time coordinate", from

Variables representing reference time must always explicitly include the units attribute; there is no default value.

to

A time coordinate identifies an instant along the continuous physical dimension of time, whether in reality or a model. Variables containing time coordinates must always explicitly include the units attribute, with a unit of measure that is physically equivalent to the SI base unit of time, followed by the word since and a reference date-time. There is no default value for the units.

Is this acceptable? If there are no further concerns, we can accept this three weeks from now, on 14th September.

Best wishes

Jonathan

larsbarring commented 3 months ago

Dear Jonathan,

Thank you for this suggestion, I think the changes looks good. Still, I would like to give it a more careful consideration in the coming week or so, but I do not expect to have anything material to add. From my side the clock can start.

Kind regards, Lars

ChrisBarker-NOAA commented 3 months ago

I like it! Thanks!

larsbarring commented 3 months ago

As anticipated I have no further comments(*) and this is a valuable clarification. Thanks Jonathan!

(*) A really minor suggestion, if you think this might be an improvement:

... for quantities used for time coordinates is s since 1958-1-1. The choice of ~~1958~~ reference time (1958-1-1) is arbitrary and not restrictive; the time ...

JonathanGregory commented 2 months ago

I have created PR #538 to implement these changes.

Thanks for your suggestion, Lars, which I have followed in a more explicit form: "The choice of reference time and date (midnight on 1st January 1958) is arbitrary and not restrictive." I hope that's OK.

When I came to modify sect 4.4, it seemed to me that a different order for the sentences would be preferable from what was agreed above. In the PR, I have made it:

A time coordinate identifies an instant along the continuous physical dimension of time, whether in reality or a model. Variables containing time coordinates must always explicitly include the units attribute. The units attribute takes a string value that follows the formatting requirements of the [UDUNITS] package. It must comprise a unit of measure that is physically equivalent to the SI base unit of time (i.e. the second), followed by the word since and a reference date-time. There is no default value for the units. These requirements can best be described by an example with explanatory comments:

If there are no concerns expressed, this PR can be merged on 14th September.

ChrisBarker-NOAA commented 2 months ago

LGTM

JonathanGregory commented 2 months ago

Thanks for your comment on the example, @ChrisBarker-NOAA. You're right that Mountain Daylight Time might not be the only name for that time zone! I suppose that Canada, Mexico, Antarctica and some Pacific islands are in the same longitude range. That text is a quotation from the man page of the udunits lib. Since it's not affected by the change we're discussing in this issue, I'd prefer that we leave it alone for the moment, because I think we will want to revise that part of the text as a consequence of discussion 304. Jonathan

ChrisBarker-NOAA commented 2 months ago

Can we just change "i.e." tp "e.g." now -- and do more later, maybe ...

larsbarring commented 2 months ago

@JonathanGregory, @ChrisBarker-NOAA :

I agree that there is merit in keeping with the original wording directly taken from the UDUNITS documentation. But it is not a direct citation in a very formalistic sense, and that the documentation probably were written without specific consideration of an international audience, which CF should do. Moreover, again being formalistic, I think that the particular sentence is not included in proposed changes. I think we have three alternatives:

Keep the UDUNITS sentence as is, but keep this part of the discussion in mind when considering [discussion #304] (https://github.com/orgs/cf-convention/discussions/304).
Include the small and non-intrusive change from "i,e." to "e.g.".
Delete the the text "(i.e. Mountain Daylight Time)" as it carry little [scientific] information because the "-6:00" part already carries the necessary information.

I do not have a preference here, but I would really like to avoid turning this into something that delays the acceptance of the really valuable clarification otherwise made by this proposal.

ChrisBarker-NOAA commented 2 months ago

I agree -- do not delay over this! Make the very small change, or don't -- either way, time to merge. I just saw a bit of copy editing that could done -- I did not mean to delay anything!

Honestly, a lot of folks aren't quite clear on the difference between "i.e." and "e.g." -- I wan't that clar until fairly recently. So I think this is similar to a typo.

But again -- don't delay over this!

davidhassell commented 2 months ago

I also agree with the new text in the PR. Thanks, Jonathan et al.

JonathanGregory commented 2 months ago

Thanks for your comments, all. In PR #538, I have replaced the offending sentence with "The time unit specification seconds since 1992-10-8 15:15:42.5 -6:00 indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of UTC." That is, I've also replaced "Coordinated Universal Time", because we always call it "UTC"; just a few sentences later, the CF text calls it "UTC" as well, without explaining the acronym. However, I'm sure we will revisit this text quite soon (see https://github.com/orgs/cf-convention/discussions/304). One reason to do so is that it's not really "UTC" in most of the CF calendars!

I assume we're all still happy with merging this on 14th, in the absence of any further concerns.

ChrisBarker-NOAA commented 2 months ago

LGTM -- thanks!

cf-convention / cf-conventions

Clarification of time coordinate requirements #166