cf-convention / discuss

A forum for any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.
43 stars 6 forks source link

Clarification of time stamps #297

Closed claashk closed 1 week ago

claashk commented 7 months ago

I am a little confused regarding the specification of times (mostly UTC vs TAI) in CF. Especially the following sentence is not clear to me after reading it multiple times (no pun intended):

It is important to realise that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.

In my understanding a CF time coordinate value 1 with units seconds since t_ref, would represent the date/time t=1 second since t_ref, where 1 s is the actual length of the interval and t_ref is the reference date/time. Do I understand the standard correctly, when I assume that in this notation t - t_ref is not guaranteed to equal one second?

More specifically, the standard seems to suggest that reference time specifications are UTC by default. As UTC introduced a leap second after 2016-12-31, does CF guarantee, that 10 seconds since 2016-12-31 23:59:59 equals 8 seconds since 2017-01-01 00:00:00 or is this not the case? This question is e.g. relevant when comparing times specified relative to different epochs (e.g. GNSS time vs Unix time stamps) with sub-second accuracy.

I am aware of the lengthy discussion regarding CF-Issue cf-convention/vocabularies#62 but as a user I find the current wording hard to understand. It appears to me, that the current intentions are to specify 1) a uniform time axis without leap seconds (as provided by TAI) 2) epoch / reference time in UTC (including leap seconds).

I am not sure, whether both requirements can be combined in a consistent fashion.

ChrisBarker-NOAA commented 7 months ago
  • a uniform time axis without leap seconds (as provided by TAI)
  • epoch / reference time in UTC (including leap seconds).

I am not sure, whether both requirements can be combined in a consistent fashion.

Well, yes and no (no and yes?) -- the problem is that the tools are limited -- I'm not aware of any (commonly used) tools that do UTC with leap seconds [*]. Without the tools, you cannot correctly calculate a time delta that crosses leap-second boundaries.

(note that there was a proposal not too long ago to define a UTC-pretending-leap-seconds-don't-exist calendar for CF which didn't pass.)

Anyway, I agree that we could find some more clear language, but this is my take:

We have gotten away with this for this long because the vast majority of use cases for CF don't require second-level precision -- so it just doesn't matter.

For those applications that DO require seconds level precision -- I think the choices are:

1) Use TAI for both the epoch and calculations -- unfortunately, that doesn't appear to be an option for CF -- there is no TAI calendar.

A TAI calendar was proposed in cf-convention/vocabularies#62 -- but that was closed due to inactivity, and the fact that the issue was partly address by other changes -- though it still seems to me to be a gap -- particularly for folks that may need to store data that is already in TAI.

2) use an epoch that is close to the times you are trying to represent. I'm having trouble thinking of an application where second precision is required, but you need to cover a timespan of many years (maybe I'm missing a use case). If you use an epoch that is close to your time of interest, then you only have a problem if your timespan of interest crossed a leap second -- which admittedly is still a problem if second precision counts, but it would likely be one second, not the 37 that currently is the difference between TAI and UTC.

NOTE: I think it would be good to add a recommendation to use epochs close to your time as a general advise anyway -- and maybe something about data type. For example, the FVCOM model defaults to using single precision hours since 1970 -- which loses second precision before you get to 2020 :-( -- you'd think that would be an obvious

3) I think one could post process the time coordinate, using leap-second-unaware tools, and then convert to/from UTC to TAI, but again, that would require another tool that was leap-second aware. That would assume that the epoch is correct UTC -- and the time-deltas are exact seconds. -- now that I think about it, maybe that is was CF should specify:

Which I think is the opposite of what it says now, which is:

[I don't see any reason for this restriction -- if it's a valid UTC time, why shouldn't it be representable?]

[again -- why not??]

[This one is important -- as a leap second does not exist in some (most) calendars, and much software would choke in it, excluding it would make sense]

[This seems the opposite of what we should want -- I'm trying to wrap my head around why we would choose, essentially, "undetermined" over "precise" ?]

As a way to talk about (and I know that has been hashed out many times in the past, but for clarity right now:

So: it seem we could define: 1) The epoch timestamp exactly corresponds to a particular point in the time continuum, as defined by the calendar. 2) The individual values in the variable correspond to points that correspond to points in the time continuum that are exactly what they say -- e.g. so many seconds since ...

And: Converting from the numerical values to a human-readable timestamp in a particular calendar is up to post-processing software that may or may not correctly capture leap seconds, etc.

Which leads to: the actual. time coordinate should be considered monotonically continuous and correct.

In practice: if someone uses software that converts from, e.g. a UTC timestamp to the "time-delta since epoch" incorrectly (or imprecisely, then my point (2) may not hold -- but I think that should be considered an error (imprecision) in the data, not an expected result.

[* which confused me a bit -- yes, leap seconds are intractable, as we don't know when the might occur in the future, but we DO know when the occurred in the past, a a library that would raise a waring when used for future times would be quite doable, I would think .. but I digress]

claashk commented 7 months ago

Thanks for the quick and quite extensive reply.

Let me start with our use case: We are (mis)using CF to store sensor level 1 data (e.g. airborne spectrometer at sensor-radiance and navigation data e.g. lat, lon, roll, ...). To synchronize these data, microsecond accuracy is required. We have the same issue when comparing satellite or airborne observations with ground-based measurements, where a deviation of >10 seconds can be an issue.

Currently we are using the GPS epoch (in UTC), which is fine if used consistently (at nanosecond resolution the first int64 overflow will occur in more than 200 years and is thus someone else's problem™). As you mentioned correctly, though, a reference time stamp closer to the data is often preferable, e.g. to reduce a microsecond counter from int64 to int32. However, this is where my problems start, because if I have to specify the new epoch in UTC, then I have to take the leap second difference between GPS epoch and the new epoch into account to be CF compliant. This increases complexity without adding any benefit (beyond CF compliance) in my opinion.

As a solution I consequently support the introduction of a TAI calendar, which would allow me to specify all times in TAI and thus free me from ever having to think about leap seconds again (until someone requests a data product in UTC). This would also seamlessly integrate with use most date/time libraries (e.g. numpy.datetime64), because they mostly do not support leap seconds and are thus optimally suited to handle TAI.

Out of curiosity: Why is the deviation from UTC such a controversial issue for the CF community, if the difference is (has been?) of negligible practical importance for the majority of users?

ChrisBarker-NOAA commented 7 months ago

+1 on a TAI calendar -- I have no use case, but

Why is the deviation from UTC such a controversial issue for the CF community, if the difference is (has been?) of negligible practical importance for the majority of users?

Good question -- IMHO the issue is that UTC, is well, "universal" . computers keep time in UTC, and thus instruments report it, so it's the default, unavoidable, familiar, and everyone (thinks) they know what it is. So any deviation from using UTC is scary :-)

sethmcg commented 7 months ago

I support adding a TAI calendar to CF.

@claashk presents a clear use-case where the distinction between UTC and TAI is relevant, and the simplest and best way to represent it would be with the standard epoch+offset form using a real-world calendar that doesn't have leap-seconds.

The addition of the calendar is very straightforward, though it sounds like we'll need to do some work to update the language quoted in the first comment. My thought is that it would make sense to move it to a discussion of the UTC calendar specifically, since as far as I'm aware, that's the only calendar that has leap-seconds.

(I will also say that personally, I think leap-seconds were a terrible idea that should never have been implemented, so anything that supports and increases awareness of alternatives like TAI that don't include leap-seconds is a benefit to the community.)

JonathanGregory commented 7 months ago

Thanks for raising the issue, @claashk. Since you have a clear use-case for TAI, I too support its introduction to CF. We have discussed it before, but we didn't have a definite use-case.

As you say, issue 148 was extremely long and eventually inconclusive. Out of curiosity, you ask, "Why is the deviation from UTC such a controversial issue for the CF community?" It's a reasonable question, but I fear that, if I try to answer it, for some reason that no-one can understand it will lead instantly to an immensely long debate in which we all get confused! :smile: However, it's worth trying, because we haven't revisited this since we made quite a lot of changes in version 1.9 to clarify this part of the document.

I believe the key thing is that a number with a units attribute of the form "time-unit since reference-time" is an encoded form of a date-time, which is a set of numbers (year, month, day, hours, minutes, seconds), all integers except seconds. That's what we say in the first two sentences of Sect 4.4.1. Perhaps it would be better if those sentences came at the start of Sect 4.4.

Of course, the encoding is the obvious and convenient one: it's the elapsed time since the reference time, in all cases except for leap-seconds in the real world. (Luckily, model calendars don't have leap-seconds.) In Sect 4.4 we say (simplified slightly for the sake of argument):

The time unit specification seconds since 1992-10-8 15:15:42 indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42 seconds in the afternoon

The example comes from the UDUNITS manual. With this units attribute, a value of 1.0 means 1992-10-8 15:15:43 i.e. 1 second later than the reference time. A value of 22927398 (about 23 million seconds, or 265.3634 days) means 1993-6-30 23:59:00. Both of those values are the elapsed time in seconds since the reference time, as you'd expect, and as the quoted text states. At the end of 30th June 1993, there was a leap second. Hence 1993-7-1 0:0:00 was 61 seconds later than 1993-6-30 23:59:00, and 22927398 + 61 seconds later than the reference time. Despite that, in the CF default or standard calendar it is encoded as 22927398 + 60 = 22927458 seconds since the reference time. This choice was made because it's what almost everyone and almost or all software does, such as UDUNITS:

You have: 22927458 seconds since 1992-10-8 15:15:42
You want: seconds since 1993-7-1 0:0:00
    2.29275e+07 seconds since 1992-10-8 15:15:42 = 0 (seconds since 1993-7-1 0:0:00)

You questioned this bullet point from Section 4.4.1

It is important to realise that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.

That is listed as one of the consequences of leap seconds not being counted in any existing CF calendar. I see it's confusing without the context. We could clarify it e.g. as

A time coordinate value does not exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents if this interval includes any leap seconds (positive or negative), because they are not counted.

The TAI calendar will not have that problem, because there are no leap seconds. The time coordinate will always equal the elapsed time, like in model calendars.

I agree with @sethmcg that it will be straightward to add TAI. I think we would avoid some complications if we did not allow the TAI calendar to represent any dates before TAI began i.e. not "proleptic". Would that make sense? Is it 1st January 1958?

sethmcg commented 7 months ago

I think prohibiting proleptic TAI makes sense, since we don't have any data from before 1958 whose accuracy is at the level that we'd need it (and I have a hard time imagining it will come into existence), and for lower-accuracy data proleptic-gregorian will suffice.

With regard to the point about time coordinate not exactly equaling the actual interval length, I think that's actually not correct. I think it would be truer to say that the length of the interval associated with a reference and a datetime depends on which calendar you're using. For the UTC calendar (only), the length of the interval cannot be calculated correctly (at an accuracy level of seconds) without reference to a list of leap-seconds that have been inserted.

So given the time coordinate 172800 seconds since 1972-12-31 00:00:00, in both the UTC and TAI calendars, the interval is exactly the same length: 172800 seconds. But the time that it refers to differs: for TAI it's 1973-01-02 00:00:00 and for UTC it's 1973-01-01 11:59:59, because in the latter case a leap second was inserted. For certain calendars, the relationship between years, days, hours, and seconds are all constant, and for others, the relationship between years and days depends on the date. For the UTC calendar, the relationship between seconds and minutes (and hence everything else, also) depends on the date as well.

It's not that the length of the interval is indeterminate. It's that if you're using a calendar that doesn't have a fixed relationship between interval length and time units, and you don't take that into account when calculating datetimes from time coordinates, you're wrong. And because UTC has leap-seconds, people doing that with UTC time coordinates are often wrong and don't know it.

JonathanGregory commented 7 months ago

Dear @sethmcg

I agree with both of these statements of yours:

I think it would be true[r] to say that the length of the interval associated with a reference and a datetime depends on which calendar you're using

So given the time coordinate 172800 seconds since 1972-12-31 00:00:00, in both the UTC and TAI calendars, the interval is exactly the same length: 172800 seconds. But the time that it refers to differs: for TAI it's 1973-01-02 00:00:00 and for UTC it's 1973-01-01 11:59:59, because in the latter case a leap second was inserted.

However, I disagree with your summary, "With regard to the point about time coordinate not exactly equaling the actual interval length, I think that's actually not correct." This must be an example of the strange phenomenon that this subject causes, where we all misunderstand one another! Maybe in this case I was vague about "time coordinate". By that phrase, I meant the number alone i.e. the element of the time coordinate variable. Do you agree with this more detailed version:

The numerical value in an element of a time coordinate variable, which has a units attribute of the form "time-unit since reference date-time", represents a unique date-time in the calendar of the time coordinate variable. The numerical value, multiplied by the specified time-unit, equals the length of the time interval between the reference date-time and the given date-time, except if there is a net non-zero number of leap seconds between the two date-times.

Best wishes

Jonathan

sethmcg commented 7 months ago

@JonathanGregory - I think I don't agree, although it's possible that I'm missing a subtlety of your point. (And my apologies for the length of what follows.)

In my mental representation, the elapsed duration has primacy. The time coordinate is exactly what the units string says it is: it tells you you how many units of time have elapsed between the reference and the coordinate value, and then the calendar tells you how to convert that into a datetime. Consequently, it's in some sense an error to use a time coordinate in units that do not have fixed length for the calendar.

You can't say units = "years since 1900-01-01 00:00:00" for a normal gregorian calendar, because the your measuring-stick isn't constant. You can do it if calendar = "noleap", because then the unit is constant.

Likewise, we can't use units of months for most calendars, because the months have different lengths (although it would be okay for a 360-day calendar, where all months are exactly 30 days long).

And following the same logic, it would be wrong to use any unit longer than seconds with the UTC calendar, because those units don't have constant length in that calendar.

Now, I say that it is in a sense an error, because there's another way to look at it: you can still meaningfully communicate time coordinates using a unit that has some variability, but your precision is limited by that variability. It is meaningful to say "years since 1900" when your calendar has leap years, but if you do, you can't use smaller units than a year. I can say that 100 years have elapsed between 1900 and 2000, but I can't say that 36525 days have elapsed in that interval - because I don't know the date to that level of precision.

For all calendars except UTC, there's a constant relationship between days, hours, minutes, and seconds, so you can express a time accurately down to the second using any of those units. But if your calendar is UTC and you have a time coordinate in units longer than seconds, you only know the time to the nearest minute. (And arguably not even that well, given that 37 leap seconds have been inserted; really you only know it to the nearest deka-minute.)

So it's not really wrong to use units that vary in length, it just implicitly limits the precision of your time coordinates. And that means that we should be truncating our reference dates to that level of precision. "years since 1900" or "months since 1900-01" would both be valid units strings for calendar = "gregorian", but "years since 1900-01-01" is lying about the precision of the time coordinate. "days since 1900-01-01 12:00" is fine for the UTC calendar (assuming we're okay with our minute values being a little fuzzy), but "days since 1900-01-01 12:00:00.0)" is bad practice, because it implies a level of precision that doesn't exist.

Now, having said all that, I realize that this viewpoint doesn't quite match what the CF standard currently says in section 4.4.1. I think that that's a defect in the standard, and we should rewrite it to say that time coordinates are exactly what their unit strings say they are, and that units + calendar may limit the precision of your time coordinates. Because I think that's how everybody uses them in practice, and we should adjust the standard to match in order to minimize confusion and error.

I was going to say that probably nobody has been correctly recording times in UTC, but on a re-read, the way things are currently specified, I think the situation we have is that the standard calendar is actually TAI, not UTC, and it's not currently possible to record datetimes from the UTC calendar in a CF-compliant way, so by definition nobody has ever done it correctly.

So now I've argued myself around to saying that we don't need to define a TAI calendar after all, we just need to clarify in the spec that the standard calendar is equivalent to TAI and we should be discussing whether and how we should define a UTC calendar (which would come with a whole discussion about the limitations of using units longer than seconds with it).

davidhassell commented 7 months ago

Hello - nice to this getting aired again.

I was going to say that probably nobody has been correctly recording times in UTC, but on a re-read, the way things are currently specified, I think the situation we have is that the standard calendar is actually TAI, not UTC, and it's not currently possible to record datetimes from the UTC calendar in a CF-compliant way, so by definition nobody has ever done it correctly.

I recall from issue 148 (haven't checked - not enough time right!) that one of the use cases was that some satellite instrument times are recorded as correct timestamp strings (2024-03-22T09:00:00.0) and then when later encoding into CF these are converted to "seconds since ..." with leap second aware software. However, when read back in from the CF-netCDF file, a conversion back to a timestamp would likely happen without leap second-awareness, because there is no indication to do otherwise.

JonathanGregory commented 7 months ago

Dear @sethmcg

Thanks for writing out your views on this in such detail. That's helpful. We partly agree, and partly disagree, it seems.

I agree with what you say about units which don't have fixed length. That's why Sect 4.4 recommends that year and month should not be used. With those units, there's a substantial risk that the data-user will not decode the time coordinate value into the date-time which the data-writer intended, because in general these units have variable length. They have a precise and fixed length in UDUNITS, but not the meaning usually expected, and CF doesn't "endorse" the definitions that UDUNITS gives them, or any other fixed definition.

I agree that day, hour and minute are also units of variable length, when referring to UTC. Most people would understand "exactly one day after 2300 on 30th June 1992" to mean "2300 on 1st July 1992". That's what UDUNITS thinks:

You have: 0 days since 1992-7-1 2300
You want: days since 1992-6-30 2300
    0 days since 1992-7-1 2300 = 1 (days since 1992-6-30 2300)

and also what Linux date thinks:

$ date -d "1992-6-30 23:00 +1 day"
Wed  1 Jul 23:00:00 BST 1992

Both of those softwares define a day with a fixed length of 24×60×60=86400 seconds, but actually in UTC there were 86401 seconds between those two dates. I believe that the overwhelming majority of existing CF-netCDF datasets which refer to events in the real world have nonetheless encoded date-times without taking account of leap seconds. The intention is that 1 day since 1992-6-30 2300 will mean 1992-7-1 2300.

You write

In my mental representation, the elapsed duration has primacy. The time coordinate is exactly what the units string says it is: it tells you you how many units of time have elapsed between the reference and the coordinate value

Of course that is a legitimate and reasonable view, but it's not the CF convention for the standard real-world calendar. As you say, "This viewpoint doesn't quite match what the CF standard currently says in section 4.4.1." That's not a defect in the convention. The text in that section is very explicit about the consequences for UTC and leap seconds, not by accident, but because it's the intention. The elapsed duration doesn't have primacy in this convention. Instead, as stated at the start of Sect 4.4.1:

A date/time is the set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer. A time coordinate value represents a date/time.

The time coordinate value is primarily an encoded date-time, not an elapsed duration. These two things are the same except when a leap second intervenes. The convention has some awkward consequences (listed in Sect 4.4.1), but I think it's the right choice because in practice this is what people and software assume that's what the convention means.

Therefore I think we do need to define the TAI calendar in CF. In practice the CF standard calendar is UTC encoded as if it were TAI, similar to what you say at the end. We have to distinguish this from TAI encoded as TAI should be, as illustrated by @davidhassell's comment, because you need to know which it is in order to decode the time coordinate into the date-time that the data-writer intended.

If there is a use-case for it, we could also define a UTC calendar, which would encode UTC as it should be (as you advocate), taking account of leap seconds. In that calendar, the bullet-point list in Sect 4.4.1 would not apply.

Best wishes

Jonathan

ChrisBarker-NOAA commented 7 months ago

As we all know, these discussions can get VERY long -- so I suggest:

1) It seems there is consensus (among the few that have participated in this discussion) that adding the TAI calendar would be a good idea. And it is already well specified, so I think pretty straightforward to do -- so start a new issue with that (and only that) as the topic? Or can we go straight to a PR?

2) The while discussion of what to do about leap-seconds in UTC is still a mess :-( -- apparently there is still need to clarification / consensus about some of this in CF.

Maybe that should be moved to a discussion -- I'm working on starting one now.

ChrisBarker-NOAA commented 7 months ago

OK -- started a discussion here:

https://github.com/orgs/cf-convention/discussions/304

sethmcg commented 7 months ago

@ChrisBarker-NOAA I'll argue this in a longer response, but I think that the TAI calendar already exists and is named "standard".

If that's the case, do we want to have TAI as a second name for it (like the 365_day and noleap calendars)? And if we do, does that change how we'd specify it?

ChrisBarker-NOAA commented 7 months ago

I think that the TAI calendar already exists and is named "standard".

I'm no expert, but yes, that appears to be the case -- though it would be better to restrict TAI to post-1972 (or thereabouts) SO yes, this may be a clarification, rather than an addition.

But I do think, one way or another, that "TAI" should be specifically referenced!

ChrisBarker-NOAA commented 7 months ago

now that I'm thinking about it -- the "standard" calendar has a LONG definition, so even if it's pretty much the same thing, adding something like this to the calendar list:

tai International Atomic Time:. TAI is used for precise time applications such as GPS systems. It is a calendar with the Gregorian rules for leap-years. It is different from UTC time by not including leap seconds. It was established in 1972. It is essentially the same as the "standard" calendar, but should not be used fro times before 1972.

Alternatively, we could tack on a shorter version of that to the "standard" calendar definition.

I think it's time for a PR or new issue if someone want to get that going ...

sethmcg commented 7 months ago

You've convinced me that TAI and standard aren't quite the same over in the new discussion topic (#304); I'll keep any discussion of TAI vs standard vs UTC over there.

I think we should create an independent TAI definition that doesn't rely on the standard definition, and I agree that we're now at the point where we have sufficient consensus to start an issue for it. I don't think it needs to be much longer than what you've got above; we just add something like "Under TAI, minutes are always exactly 60 seconds long, hours are always exactly 60 minutes long, and days are always exactly 24 hours long. Months and years follow the proleptic-Gregorian calendar."

And although it was established in 1972, is there any reason it couldn't be extended backwards following proleptic-Gregorian rules?

ChrisBarker-NOAA commented 7 months ago

And although it was established in 1972, is there any reason it couldn't be extended backwards following proleptic-Gregorian rules?

It certainly could -- but does anyone do that, or need that? If so, it would be proleptic-tai, I suppose.

I can't imagine there's a use-case -- my tendency is not to introduce something that isn't standard and no one has a use case, but it also wouldn't' hurt,

JonathanGregory commented 7 months ago

Thanks, for asking your question, @claashk. Has the question been answered, as far as it can be for the moment? As you see, you have reignited the discussion, now carrying on in #304. :smile: Feel free to join in! It's likely that it will lead to a proposal to add the TAI calendar. I will add the FAQ label to this question, to remind us at some point to make sure that TAI is discussed in the FAQ.

claashk commented 1 week ago

Thank you very much for the comprehensive and constructive discussion. The proposal of a tai calendar sounds good to me and would solve our problems regarding this issue.

JonathanGregory commented 1 week ago

Thank you, @claashk. I will close this issue now, because we are discussing a definite proposal (in conventions issue 542) to include the TAI calendar, among other calendar issues.