Add calendars gregorian_tai and gregorian_utc

cf-convention / cf-conventions

AsciiDoc Source

http://cfconventions.org/cf-conventions/cf-conventions

Creative Commons Zero v1.0 Universal

89 stars 46 forks source link

Add calendars gregorian_tai and gregorian_utc #148

Closed JimBiardCics closed 2 years ago

JimBiardCics commented 6 years ago

Introduction

The current CF time system does not address the presence or absence of leap seconds in data with a standard name of time. This is not an issue for model runs or data with time resolutions on the order of hours, days, etc, but it can be an issue for modern satellite swath data and other systems with time resolutions of tens of seconds or finer.

I have written a background section for this proposal, but I have put it at the end so that people don't have to scroll through it in order to get to proposal itself. If something about the proposal seems unclear, I hope the background will help resolve your question.

Proposal

After past discussions with @JonathanGregory and again with he and @marqh at the 2018 CF Workshop, I propose the new calendars listed below and a change to existing calendar definitions.

gregorian_tai - When this calendar is called out, the epoch date and time stated in the units attribute are required to be Coordinated Universal Time (UTC) and the time values in the variable are required to be fully metric, representing the the advance in International Atomic Time (TAI) since that epoch. Conversion of a time value in the variable to a UTC date and time must account for any leap seconds between the epoch date and the time being converted.
- gregorian_utc - When this calendar is called out, the epoch date and time stated in the units attribute are required to be in UTC and the time values in the variable are assumed to be conversions from UTC dates and times that did not account for leap seconds. As a consequence, the time values may not be fully metric. Conversion of a time value in the variable to a UTC date and time must not use leap seconds.
gregorian - When this calendar is called out, the epoch date stated in the units attribute is required to be in mixed Gregorian/Julian form. The epoch date and time have an unknown relationship to UTC. The time values in the variable may not be fully metric, and conversion of a time value in the variable to a date and time produces results of unknown precision.
the others - The other calendars all have an unknown relationship to UTC, similar to the gregorian calendar above.

The large majority of existing files (past and future) are based on artificial model time or don't need to record time precisely enough to require either of the new calendars (gregorian_tai or gregorian_utc). The modified definition of the gregorian calendar won't pose any problem for them. For users that know exactly how they obtained their times and how they processed them to get time values in a variable, the two new calendars allow them to tell users how to handle (and not handle) those time values.

Once we come to an agreement on the proposal, we can work out wording Section 4.4 to reflect these new/changed calendar definitions.

Background

There are three parts to the way people deal with time. The first part is the counting of the passing of time, the second part is the representation of time for human consumption, and the third is the relationship between the representation of time and the orbital and rotational cycles of the earth. This won't be a deep discussion, but I want to define a few terms here in the hopes that it will help make things clearer. For gory details, please feel free to consult Google and visit places such as the NIST and US Naval Observatory websites. I'm glossing over some things here, and many of my definitions are not precise. My goal is to provide a common framework for thinking about the proposal, as opposed to writing a textbook on the topic.

The first part is the simplest. This is time as a scalar quantity that grows at a fixed rate. This, precisely measured, is what people refer to as 'atomic time' - a count of cycles of an oscillator tuned to resonate with an electron level transition in a sample of super-cooled atoms. The international standard atomic time is known as International Atomic Time (TAI). So time in this sense is a counter that advances by one every SI second. (For simplicity, I am going to speak in terms of counts of seconds throughout this proposal.) No matter how you may represent time, whether with or without leap days or seconds, this time marches on at a fixed pace. This time is metric. You can do math operations on pairs or other groups of these times and get consistently correct results. In the rest of this proposal I'm going to refer to this kind of time as 'metric time'.

The second part, the representation of time, is all about how we break time up into minutes, hours, days, months, and years. Astronomy, culture, and history have all affected the way we represent time. When we display a time as YYYY-MM-DD HH:MM:SS, we are representing a point in time with a label. In the rest of this proposal I'm going to refer to this labeling of a point in time as a time stamp.

The third part, the synchronization of time stamps with the cycles of the planet, is where calendars come into play, and this is where things get ugly. Reaching way back in time, there were three basic units for time - the solar year, the lunar month, and the solar day. Unfortunately, these three units of time are not compatible with each other or with counts of seconds. A solar day is not (despite our definitions) an integer number of seconds in length, a lunar month is not an integer number of solar days (and we pretty much abandoned them in Western culture), and a solar year is not an integer number of solar days or lunar months in length. If you attempt to count time by incrementing a time stamp like an odometer - having a given element increment once each time the element below it has 'rolled over', you find that the time stamps pretty quickly get out of synchronization with the sun and the seasons.

The first attempts to address this asynchrony were leap days. The Julian calendar specified that every four years February would wait an extra day to roll over to March. The Gregorian calendar addressed a remaining asynchrony by specifying that this only happens on the last year of a century (when it normally would) every fourth century. That was close enough for the technology of those days. Clocks weren't accurate enough at counting seconds to worry about anything else. But the addition of leap days (as well as months with random lengths) means that time stamps aren't metric. You can't do straightforward math with them.

In more recent times technology and science have advanced to the point that we can count seconds quite accurately, and we found that keeping the time stamp hours, minutes, and seconds sufficiently aligned with the rising of the sun each day requires the addition (or subtraction) of leap seconds. On an irregular, potentially bi-yearly, basis, the last minute of a day is allowed to run to 60 before rolling over instead of 59 (or rolls over after 58, though it's lately been only additions). Coordinated Universal Time (UTC) is the standard for time stamps that include both leap days and leap seconds.

UTC time stamps represent the time in a human-readable form that is precise and synchronized with the cycles of the earth. But they aren't metric. It's not hard to deal with the leap days part because they follow a fixed pattern. But the leap seconds don't. If you try to calculate the interval between 2018-01-01 00:00:00 and 1972-01-01 00:00:00 without consulting a table of leap seconds and when they were applied, you will have a difference of 27 seconds between the time you get from your calculation and the time has actually elapsed between those two time stamps. This isn't enough of a discrepancy to worry about for readings from rain gauges or measurements of daily average temperature, but an error of even one second can make a big difference for data from a polar-orbiting satellite moving at a rate of 7 km/second.

The clocks in our computers can add further complexity to measuring time. The vast majority of computers don't handle leap seconds. We typically attempt to address this by using time servers to keep our computer clocks synchronized, but this is done by altering the metric time count in the computer rather than modifying the time stamps by updating a table of leap seconds.

Furthermore, most computer software doesn't have 'leap second aware' libraries. When you take a perfectly exact UTC time stamp (perhaps taken from a GPS unit) and convert it to a count of seconds since an epoch using a time calculation function in your software, you are highly likely to have introduced an error of however many leap seconds that have been added between your epoch and the time represented by the time stamp.

As a result of all this, many of the times written in netCDF files are not metric times, and there is no good way to know how to produce accurate time stamps from them. They may be perfectly metric within a given file or dataset, they may include skips or repeats, or they may harbor non-linearities where there are one or more leap seconds between two time values.

We have another minor issue for times prior to 1972-01-01. There's not much way to relate times prior to that epoch to times since - not to the tens of seconds or better level. I'd be surprised if this would ever be a significant problem in our domain.

To summarize, we have TAI, which is precise metric time. We have UTC, which is a precise, non-metric sequence of time stamps that are tied to TAI, and we have a whole host of ways that counts time since epoch stored in netCDF files can be inaccurate to a level as high as 37 seconds (the current leap seconds offset between TAI and UTC).

Most uses of time in netCDF aren't concerned with this level of accuracy, but for those that are, it can be critical.

cameronsmith1 commented 6 years ago

I remember when CF discussed time a few years ago. It was the longest discussion I ever followed on CF. You have addressed the main points that I remember.

hrajagers commented 6 years ago

The proposal looks sensible to me. One item that might be worthwhile to discuss or mention is the way in which times should be specified when the calendar is "gregorian_tai". It seems to me that implicit in that definition is included the requirement to store time as "seconds since ..." a reference date/time. For any time unit larger than seconds it will be difficult to do conversions to seconds consistently ... unless we define "minutes" as strictly 60 seconds and so on.

JimBiardCics commented 6 years ago

All units in time variables follow their UDUNITS definitions. According to UDUNITS, seconds are SI seconds, minutes are 60 seconds, hours are 3600 seconds, days are 86400 seconds, weeks are 604800 seconds (7 days), and fortnights are 1209600 seconds (14 days). Years are 31556925.9747 seconds (365.242198781 days). Months are a mess, with 1 month defined as being 1/12 of a year - 30.436849898 days. As CF says in section 4.4, time should not use units of months or years, as those quantities will produce unexpected results if you are not very careful with them.

I see no problem in storing times in any units from yoctoseconds (1e-24 seconds) up to and including fortnights, as they are all clean and consistent multiples of SI seconds.

cameronsmith1 commented 6 years ago

If you are going to specify seconds (or yocto seconds), then is it necessary to specify what type of number (integer, real) it is, to make sure that the specified number can be large enough and precise enough to be useful? Specifically, if using some sort of integer and the number of seconds you need could exceed the maximum value for some types of integer, and if you are using real numbers there may not be enough precision to distinguish between one second and the next (when the number of seconds gets large).

ChrisBarker-NOAA commented 6 years ago

@cameronsmith1:

a given data file is definition is using a particular type for a variable -- so yes, the file creator needs to be thoughtful about it, but I don't think CF has to say anything about it.

martinjuckes commented 6 years ago

Hello Jim, thanks for directing me here from the mail list discussion on 360 day calendars.

As you might guess from my latest contribution to that discussion, I have reservations about relaxing the specification of time to allow a non-metric interpretation. Introducing a system which makes the interpretation of the units dependant on the value of an additional attribute looks like a substantial step to me, and I can't see any justification for it.

I'm not sure if I understand the comments in your proposal about non-metric time values. Take, for example, a short period spanning the last leap second, which occurred at 2016-12-31 23:59:60. As a result of this leap second being inserted 2 minutes since 2016-12-31 23:59:00 should be 0 minutes since 2017-01-01 00:00:59 rather than 0 minutes since 2017-01-01 00:01:00. This may be counter-intuitive, but the measure of time in minutes is still metric. The minutes counter in the time stamp is not enumerating constant intervals of 1 minute, just as the year counter is not enumerating years of constant length (the Gregorian year, neglecting leap seconds, is 365.2425 days, while the counter in the time stamp sometimes represents an increment of 366, other times 365).

Software often adopts a simplified relationship between the counters in the time stamp and the metric time. An extreme case of this is the 360 day calendar we have been discussing on the link I mention above .. in which we have 30 days to the months and 12 months to the year, so that all counter increments relate directly to specific time intervals.

My understanding is that by default the time stamp (the date following since in the units string) follows ISO 8601, which does include leap seconds. However, the leap seconds are not in the UDUNITS software, so we don't have an easy way of making use of this. The current CF convention implies that the interpretation of the units string follows UDUNITS, and UDUNITS always treats the time stamp as being in the standard Gregorian/Julian calendar. I have the impression that this is not always the intended meaning ... but would be a diversion from this thread.

All the references I've found indicate that the time elapsed in the UTC system is exactly, by definition, time elapsed as measured by the atomic clock. The only difference is that UTC includes a concept of days, hours, and minutes, and the UTC minute does not have a constant length.

It seems to me that the distinction is between the Julian/Gregorian calendar, in which the interval between 2016-12-31 23:59:00 and 2017-01-01 00:01:00 is 120 seconds, and the standard UTC calendar, in which this interval is 121 seconds.

Wouldn't it be sufficient, as far as the CF standard is concerned, to recognise that the Gregorian/Julian calendar is no longer standard .. and perhaps introduce the term you suggest, gregorian_utc, as a new alias for standard?

There is a separate problem concerning the UDUNITS implementation ... but if terminology is agreed here, we could discuss that on the UDUNITS repository issues.

JimBiardCics commented 6 years ago

@martinjuckes You are exactly correct about the proper interpretation of time around a leap second.

A large number of existing observational datasets obtain time as a UTC time stamp and then convert it to an elapsed time since an epoch using "naive" software which does not take leap seconds into account. A growing number of observational datasets directly acquire precise and accurate elapsed times since an epoch, either from a GPS unit or a satellite operating system, or they acquire time stamps that don't include any leap seconds (TAI time stamps, for example) and convert them using naive software. As it currently stands, those creating the time variables have no way to indicate which way the data was acquired, and users have no way to tell how they should interpret the values.

As I mentioned, the question is often unimportant because the time resolution of the data acquired is coarse enough that it doesn't matter, or because it comes from non-physical systems such as models. When it the question is important, it can be critical.

The CF explanation of the calendar attribute is

In order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. For this purpose we recommend that the calendar be specified by the attribute calendar which is assigned to the time coordinate variable.

We are trying to make a way for people to indicate what they have done when it matters without burdening those for whom this is not important.

Here's a table showing the impact of different processing sequences with respect to leap seconds when attempting to obtain a UTC time stamp from a time variable with an accurate UTC epoch date and time (assuming there are one or more leap seconds within the time variable range or since the epoch date and time).

Time Source	Conversion to Time Var	Accurate Time Var?	Conversion from Time Var	Result
UTC time stamp	Naive	No	Naive	Correct UTC
UTC time stamp	Naive	No	Smart	Incorrect
UTC time stamp	Smart	Yes	Naive	Incorrect
UTC time stamp	Smart	Yes	Smart	Correct UTC
Accurate elapsed time	-	Yes	Naive	Incorrect
Accurate elapsed time	-	Yes	Smart	Correct UTC

Naive - The conversion is unaware of leap seconds. Smart - The conversion is aware of leap seconds. Accurate Time Var - The values in the time variable have no unexpected internal or external offsets due to leap seconds.

The last two entries in the table are, in truth, equivalent to the middle two. It doesn't really matter whether you started with UTC time stamps, TAI time stamps, or GPS elapsed time counts - as long as you end up with accurate TAI-compatible elapsed time values, the conversion to correct UTC time stamps must be smart (properly handle leap seconds).

I'm open to other names for the two new calendars, but I think the suggested names are reasonable.

ChrisBarker-NOAA commented 6 years ago

Lots of good points here, but I think this should be very simple:

The CF explanation of the calendar attribute is

In order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. For this purpose we recommend that the calendar be specified by the attribute calendar which is assigned to the time coordinate variable.

The fact is that UTC and TIA are slightly different calendars -- as such users absolutely need to be able to specify which one is appropriate. Done.

So this proposal is a great (maybe the only, really) compromise between the specification we need and backward compatibility.

I am a bit confused by this, though:

gregorian_utc
the time values may not be fully metric.

huh? not entirely sure what "fully metric" means, but if you have e.g.:

seconds since 1980-01-01T12:00:00

then the actual values in the array should fully represent monotonically increasing values, and time[i] - time[j-i] should always represent exactly the number of seconds that has elapsed.

what is non-metric about that???

BTW, this all is a really good reason to encode time in this way, rather than, say ISO datetime strings....

-CHB

martinjuckes commented 6 years ago

@JimBiardCics Thanks for that detailed reply. I think I understand now. I've also done a bit more background reading and learned how extensive this problem is, causing disruption to companies like Google and Amazon, who have both adopted messy (and mutually inconsistent) work-arounds to the problem.

I'd like to split this into two related problems: (1) the current CF convention does not fully explain how to deal with leap years, and does not distinguish between TAI time and UTC time; (2) users struggle with the available tools (which are not good) and we need to make things easier to avoid having repeated errors in the information encoded in our netcdf files;

(1) is reasonably easy to deal with (at first sight) ... we need two calendars which differ by the leap seconds, so that 2018-06-01 12:00:00Z[calender A] corresponds to 37 seconds since 2018-06-01 12:00:00Z[calender B]. In this case the minute counter of calendar A would be of variable length. Calendar A and B would both be based on the Gregorian calendar. The only difficulty is that the translation from Calendar A to Calendar B is undefined for dates after 2019-06-30 -- for a standard such as CF this is problematic.

There also appears to be a slight inconsistency in the convention between the introduction to section 4.4, which explains how the time stamp relates to UTC time, and subsection 4.4.1 (which you quote) which states that the time stamp will be interpreted according to the calendar. Perhaps this just needs a clarification that the introductory remarks only apply for specific calendars. There is a further complication in that Udunits neglects the leap seconds, so it is not accurate at the level you want to achieve here.

(2) introduces some complexity, and I think this is where the idea of a "non-metric" coordinate comes in. In an ideal world we might deal with (2) by improving the available tools, but we don't know when the next leap second is coming (it will be midnight on June 30th or December 31st .. and almost certainly not this year ... but beyond that we have to wait for a committee decision, so nothing can be programmed in advance).

What I think you are suggesting, to address point (2), is that we allow a construction in which 2 minutes since 2016-12-31 23:59:00 to be 0 minutes since 2017-01-01 00:01:00 and for these to be UTC times, so that the length of the 2 minutes is 121 seconds. Consequently, any coordinate we defined through this kind of construction would be, in your words, a non-metric time. I can see the logic of introducing something of this kind, but (and this takes us back to the conversation in the 360 day calendar thread) I don't think we can do it in a variable with standard name time or use units which are defined as fixed multiple of the SI second. The time concept is already very busy, and the metric nature of time plays an important role in many aspects of the physical system. If you accept that this is a new variable I'd be happy to support the proposal (e.g. you suggested abstract_time in the email thread, or datetime is a common tag in libraries dealing with calendars). Similarly, having units of measure which have a fixed meaning independent of context is a firmly established principle in the physical sciences and beyond, so if we want a unit of measure which is a bit like a minute, but sometimes 61 seconds long, I'd be happy to accept this provided it is called something other than minute (and the same goes for hour, day, month, year).

martinjuckes commented 6 years ago

Here is a CDL illustration of how I think this could work, with separate variables for the (1) months and years (with each year 12 months, months variable numbers of days), (2) days, hours, minutes (60 minutes to the hour, 24 hours to the day, variable length minutes) and (3) seconds (SI units .. only counting seconds since last minute). I've not included a true time axis, as trying to convert the times to UTC times in a single parameter would be error prone and defeat the point of the example.


dimensions:
    time = 2 ;
variables:
    float mydata(time) ;
        mydata:coordinates = "time time_ca time_cl seconds" ;
    int time_ca(time) ;
        time_ca:long_name = "Calendar months [years*12 + months]" ;
        time_ca:units = "month_ca since 1980-01-01 00:00:00" ;
        time_ca:calendar = "gregorian_utc" ;
        time_ca:standard_name = "time_ca" ;
    int time_cl(time) ;
        time_cl:long_name = "Seconds elapsed since last full UTC minute" ;
        time_cl:units = "hour_cl since 1980-01-01 00:00:00" ;
        time_cl:calendar = "gregorian_utc" ;
        time_cl:standard_name = "time_cl" ;
    float seconds(time) ;
        seconds:standard_name = "time" ;
                seconds:units = "s";

// global attributes:
        :Conventions = "CF-1.7" ;
        :title = "Sample of proposed new abstract time variables" ;
        :comments = "encodes \'1980-06-01 12:02:15\' and \'1981-06-01 12:04:35\' in UTC time" ;
data:

 mydata = 0, 1 ;

 time_ca = 6, 18 ;

 time_cl = 722, 724 ;

 seconds = 15, 35 ;
}```

ChrisBarker-NOAA commented 6 years ago

About udunits:

There is a further complication in that Udunits neglects the leap seconds, so it is not accurate at the level you want to achieve here.

I'm confused:

CF "punts" to Udunits for unit definitions. But there is nothing in CF that says you have to use udunits for processing your data, and I also don't see any datetime manipulations in Udunits anyway (maybe I've missed something). So what does Udunits have to do with this conversation?

Udunits does make the unfortunate choice of defining "year" and "month" as time units, but I thought CF already discouraged (if not banned) their use.

ChrisBarker-NOAA commented 6 years ago

The only difficulty is that the translation from Calendar A to Calendar B is undefined for dates after 2019-06-30 -- for a standard such as CF this is problematic.

Is it? It's terribly problematic for libraries -- which is why most of them don't deal with it at all. But for CF, I'm not sure it matters:

datetimes are encoded as:

a_time_unit since a_datetime_stamp.

TIA and UTC define seconds the same way. And no one should use a time unit that isn't clearly defined multiple of seconds.

The only difference between TIA and UTC here is when you want to convert that encoding to a datetime stamp -- exactly what datetime stamp you get depends on whether you are using TIA or UTC, but the seconds are still fine.

If someone encodes a datetime in the future, and specifies TIA time, then clients will not be able to properly convert it to a datetimestamp -- but that isn't a CF problem.

ChrisBarker-NOAA commented 6 years ago

Breaking these down into manageable chunks ...

... we allow a construction in which 2 minutes since 2016-12-31 23:59:00 to be 0 minutes since 2017-01-01 00:01:00 and for these to be UTC times, so that the length of the 2 minutes is 121 seconds. Consequently, any coordinate we defined through this kind of construction would be, in your words, a non-metric time.

Yeah, I can see that would be "non-metric" -- but why in the world would we want to allow that? and in fact, how could you allow that???

CF time encoding involves one datetimestamp, not two. It would be a really bad idea to somehow mix and match TIA and UTC in one time variable -- specify the timestamp in UTC, but that processing tools should use TIA from there on??

two minutes is never, ever, anything but 120 seconds. Let's not confuse what I am calling a timestamp -- i.e. the "label" for a point in time from a time duration. So:

the duration between:

2016-12-31 23:59:00 and 2017-01-01 00:01:00

is 120 seconds in the UTC Calendar, and 121 seconds in the TIA calendar.

so:

120 seconds since 2016-12-31 23:59:00 will be a different timestamp if you use the TIA calendar than if you use the UTC calendar -- just as it maybe be different if you use any other calendar....

So what is the problem here?

Granted -- if you want to accurately create or use a tiem variable with the TIA calendar, you need a library that can do that -- but that is not CF's problem.

JimBiardCics commented 6 years ago

@ChrisBarker-NOAA As @martinjuckes mentioned in his comment, I'm calling the UTC time 'non-metric' because a time variable for data sampled at a fixed rate based on UTC time stamps converted to elapsed time since an epoch without dealing with leap seconds may contain deviations from what you would expect. If you attempt to 'do math' with the contents, you may find that adding an interval to a time does not produce the time you expected and subtracting two times does not produce the interval expected.

Let's say I have acquired data at regular intervals, dt, and I have captured the time by grabbing accurate UTC time stamps for each acquisition. If I construct my time variable by naively converting my acquired time stamps to elapsed time since my epoch UTC time stamp, I might have one or more problems lurking in my time variable.

If a leap second occurred over the span of my time variable, there will be a point in the times where one of three problems appear.
1. If dt is greater than 1 second, there will be an interval between successive time values that is less than dt. The formula *t[i] = t[0] + idt** won't hold for all the times in the variable.
2. If dt is equal to 1 second, the variable also won't be monotonic. There will be a pair of time values that are identical.
3. If dt is less than 1 second, the variable won't be monotonic, but rather than a pair of time values that are identical, there may be a section where one or more time values are less than a preceding value.
If one or more leap seconds occurred in the time range between my epoch time stamp and my first acquired time stamp, my time values will be internally consistent, but the whole set of values will be smaller than expected.

The non-monotonicity problem is one that I don't even want to get into. And, again, for someone measuring things once an hour (for example) this is all pretty ignorable.

ChrisBarker-NOAA commented 6 years ago

About @martinjuckes' CDL:

I am really confused as to why anyone would ever want to do that.

I remember a thread a while back about encoding time as ISO 8601 strings, rather than "timeunit since timestamp" -- at the time, I opposed the idea, but now we have an even better reason why.

If we stick to the current CF convention, then all we need to do is specify the TIA calendar as a valid calendar (and clarify UTC vs TIA) -- that's it -- nothing else needs to change, and there is no ambiguity.

Is the goal here to be able to specify TIA in a way that users can use it without a TIA-aware library? I think that's simply a bad idea -- if you don't have a TIA aware library, you have no business working with TIA times (at least if you care about second-level precision).

ChrisBarker-NOAA commented 6 years ago

@JimBiardCics wrote:

I'm calling the UTC time 'non-metric' because a time variable for data sampled at a fixed rate based on UTC time stamps converted to elapsed time since an epoch without dealing with leap seconds may contain deviations from what you would expect. If you attempt to 'do math' with the contents, you may find that adding an interval to a time does not produce the time you expected and subtracting two times does not produce the interval expected.

Thanks, got it.

My take on this -- if you do that, you have created invalid, incorrect data. CF should not encode this as a valid thing to do. As far as I'm concerned, it's the same as if you did datetime math with a broken library that didn't do leapyears correctly.

And frankly, I'm not sure HOW we could accommodate it anyway -- I'm still a bit confused about exactly when leap seconds are applied to what (that is, I think *nix systems, for instance, will set the current time with leap seconds -- so the "UTC" timestamp is really "TIA as of the last time it was reset". Which I think is the concern here -- if you are collecting data, and are getting a timestamp from a machine, you don't really know which leap-seconds have been applied.

But again, that's broken data....

If we were to try to accommodate this kind of broken data, I have no idea how one would do it? One of the reasons that leap seconds are not used in most time libraries is that they are not predictable. So a lib released last year may compute a different result than one released today -- how could we even encode that in CF?!?!

JimBiardCics commented 6 years ago

@ChrisBarker-NOAA The first idea here is to allow data producers a way to clearly indicate that the values in their time variables are metric elapsed time with none of the unexpected discrepancies that I referred to earlier. Instead of gregorian_tia, we could call the calendar gregorian_linear or gregorian_metric or some such. We thought to reference TIA because TIA is, at base, a metric, linear count of elapsed seconds since the TIA epoch date/time.

The second idea here is to allow data producers to clearly indicate that the values in their time variables are non-metric elapsed time potentially containing one or more of the unexpected discrepancies that I referred to earlier, and to indicate that in all but one case they will get correct UTC time stamps if they convert the time values to time stamps using a method that is unaware of leap seconds. This result is not guaranteed if you add an offset to a time value before conversion, and differences between time values may not produce correct results. You may obtain one or more incorrect time stamps if your time values fall in a leap second.

Keep in mind that the potential errors are, as of today, going to be less than or equal to 37 seconds, with many of them being on the order of 1-2 seconds.

For backward compatibility, and for the vast number of datasets out there where errors of this magnitude are ignorable, the existing gregorian calendar (with a warning added in the Conventions section) will remain. It would impose a pretty severe burden to insist that all data producers use only gregorian_tai or gregorian_utc going forward.

The use of the word metric is problematic because minds inevitably go to metric system. I have used it this way so many times when thinking about this that I forget that is is confusing.

JimBiardCics commented 6 years ago

@martinjuckes The point here is that we have an existing way to represent time that has been used for quite a few years now. This was never a problem for climate model data or data acquired on hourly or longer time intervals. We may at some future point (CF 2.0?, CF 3.0?) want to consider some significantly different way of handling time. For CF 1.* we want to find a way to accommodate satellite and other high frequency data acquisition systems without imposing unneeded burdens on anyone.

CF says that the purpose of the calendar attribute is to tell you what you need to know to convert the values in a time variable into time stamps. We aren't telling them (at least not directly) how we obtained the values in the time variable. We are telling them how to use them. @JonathanGregory, @marqh, and I came to the conclusion that, while there may be cases we didn't consider, pretty much every time variable anyone might create (within reason) would fall into one of three categories:

The elapsed time values are fully metric and their relationship to the UTC epoch date and time is accurate. (The gregorian_tai case.) You must take leap seconds into account when converting these time values into UTC time stamps if you want fully accuracy.
The elapsed time values are almost certainly not fully metric and their relationship to the UTC epoch date and time is probably not accurate, but if you convert them to UTC time stamps without adding any offsets, using a method that does not take leap seconds into account, you will get UTC time stamps with full accuracy. (The gregorian_utc case.)
We don't have a clue about the metricity or accuracy of the elapsed time values or the epoch date and time. At least not to within 37 seconds. And we don't care. (The updated gregorian case.)

Time representation is a monster lurking just under the surface. Everything was fine until we looked down there. The only pure time is counts of SI seconds (or fractions thereof) since some agreed starting point. Everything else is burdened with thousands of years of history and compromise.

martinjuckes commented 6 years ago

Hi @JimBiardCics : I don't know where you get the idea that CF time is different from SI time : as far as I can see CF time is defined as SI time measured in SI units. Making a change to depart from SI units is a big deal.

ChrisBarker-NOAA commented 6 years ago

@ChrisBarker-NOAA https://github.com/ChrisBarker-NOAA The first idea here is to allow data producers a way to clearly indicate that the values in their time variables are metric elapsed time with none of the unexpected discrepancies that I referred to earlier. Instead of gregorian_tia, we could call the calendar gregorian_linear or gregorian_metric or some such. We thought to reference TIA because TIA is, at base, a metric, linear count of elapsed seconds since the TIA epoch date/time.

gregorian_tia is fine.

OK, I vote to simply not allow that in CF. Those discrepancies are errors. If second level precision is important, then don’t use a library without that precision to write your data.

and to indicate that in all but one case they will get correct UTC time stamps if they convert the time values to time stamps using a method that is unaware of leap seconds.

I’m not sure we can know that for a given dataset. As leap seconds are unpredictable, and computer clocks imprecise, the “UTC” time you get from a system clock may or may not have had the last leap second adjustment at just the right time. Granted, that’s only a potential second off, but still ...

If your application cares about second-level precision you should use TAI time — isn’t that what GPSs use, for instance?

So in a CF time variable with, e.g.

seconds since a_datetime_stamp

The only thing you should need to know is whether the time stamp is UTC or TAI. Other than that, a second is a second....

In practice, if you care about second-level precision, you really should use a time stamp that’s close to your data anyway :-)

For backward compatibility, and for the vast number of datasets out there where errors of this magnitude are ignorable, the existing gregorian calendar (with a warning added in the Conventions section) will remain.

Agreed.

The use of the word metric is problematic

Well, I think I got that part anyway :-)

-CHB

ChrisBarker-NOAA commented 6 years ago

I just noticed these:

The elapsed time values are fully metric and their relationship to the UTC epoch date and time is accurate. (The gregorian_tai case.)

If the timestamp Is UTC, you sure don’t want to call it TAI, do you?

You must take leap seconds into account when converting these time values into UTC time stamps if you want fully accuracy.

Yes, because UTC dies include leap seconds.

The elapsed time values are almost certainly not fully metric and their relationship to the UTC epoch date and time is probably not accurate, but if you convert them to UTC time stamps without adding any offsets, using a method that does not take leap seconds into account, you will get UTC time stamps with full accuracy. (The gregorian_utc case.)

Then the timestamp is NOT UTC. I think this should be simply considered incorrect, but it certainly shouldn’t be called something_utc, since it’s not UTC.

We don't have a clue about the metricity or accuracy of the elapsed time values or the epoch date and time. At least not to within 37 seconds. And we don't care. (The updated gregorian case.)

I think this should be what you call your “non-metric” case._tai

UTC and TAI are well defined (at least from now into the past) if we call it something _utc it should be UTC — and if we call it something_tai it should be TAI.

The vast majority of files in the wild are probably thought to be UTC, but processed with non-leap second aware libraries. And as you pint out, the vast majority of those it doesn’t matter.

I’m less optimistic than Jim that there are people writing files that are “UTC but processed with a non-leap second aware library” that:

Care about full precision
Understand what they have done enough to guarantee reversibility.

I would think that folks working with data for which this matters would have tools and libraries that do it right.

So I propose that “gregorian” mean — “ambiguous with regard to leap seconds”, since that’s what files in the wild are.

gregorian_utc means “truly UTC, leap seconds and all”

gregorian_tai means “truly TAI — no leap seconds”

And that it really only applies to the timestamp, as the values should have their usual definition.

-CHB

The only pure time is counts of SI seconds (or fractions thereof) since some agreed starting point.

Yup — which is why CF is the way it is :-)

martinjuckes commented 6 years ago

Hi @ChrisBarker-NOAA : can't we keep the de-fact definition of days in the Gregorian calendar being exactly 86400 seconds? Precision in the standard is a good thing. Clearly there will be less precision in the time values entered in many cases, but that is up to the users who will, hopefully, provide the level of accuracy required by their application. This would make gregorian equivalent to gregorian_tai.

You ask why it matters to the convention that the UTC calendar is not defined into the future past the next potential leap second: this is an issue of reproducibility and consistency. The changes are small, but we are only introducing this for people who care about small changes. The ambiguity only applies to the time-stamp: the interpretation of the statement x seconds since 2018-06-01 12:00:00Z will not be affected by future leap second, but the meaning of y seconds since 2020-06-01 12:00:00Z may change if a leap second is introduced next year. One solution would be to disallow the use of a time-stamp after the next potential leap second.

I agree with Chris that we can't hope to retro-fix problems which may have occurred due to people putting in information inaccurately. What we need is a clear system which allows people to enter information accurately.

cf-metadata-list commented 6 years ago

Hi All,

I would like to support Chris Barker’s approach.

A Gregorian day, and calendar, as defined by BIPM (Bureau International des Poids et Mésures), IERS (International Earth Rotation Service) and as notated in ISO8601, may have leap seconds, and must have them if they are declared, by definition. This will stay this way, even if the ITU WRC (International Telcommunications Union, World Radio-communcation Conference) does succeed in passing a motion abolishing leap seconds in the next decade or so.

A de facto definition of 86400 (SI) seconds in a Gregorian day is wrong and always has been.

The past ambiguous labelling of many NetCDF datsets as Gregorian is unfortunate and not easily fixed.

Chris B’s proposal is minimal, (and therefore to be supported!) with three labels:

Ambiguous, status quo;
Proper Gregorian calendar, with leap seconds;
Proper International Atomic Timescale, without leap seconds, leap days, months, etc.

I also support at least one further, separate, label for a 360-day year.

I do not have strong views on what the actual labels should be.

Chris

PS Apologies for not delving into GitHub.

From: owner-cf-metadata@listserv.llnl.gov owner-cf-metadata@listserv.llnl.gov On Behalf Of Martin Sent: 31 October 2018 11:05 To: cf-convention/cf-conventions cf-conventions@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [cf-convention/cf-conventions] Add calendars gregorian_tai and gregorian_utc (#148)

Hi @ChrisBarker-NOAAhttps://github.com/ChrisBarker-NOAA : can't we keep the de-fact definition of days in the Gregorian calendar being exactly 86400 seconds? Precision in the standard is a good thing. Clearly there will be less precision in the time values entered in many cases, but that is up to the users who will, hopefully, provide the level of accuracy required by their application. This would make gregorian equivalent to gregorian_tai.

You ask why it matters to the convention that the UTC calendar is not defined into the future past the next potential leap second: this is an issue of reproducibility and consistency. The changes are small, but we are only introducing this for people who care about small changes. The ambiguity only applies to the time-stamp: the interpretation of the statement x seconds since 2018-06-01 12:00:00Z will not be affected by future leap second, but the meaning of y seconds since 2020-06-01 12:00:00Z may change if a leap second is introduced next year. One solution would be to disallow the use of a time-stamp after the next potential leap second.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cf-convention/cf-conventions/issues/148#issuecomment-434646396, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AfI2gTx-0s4udX8sbpG_XaYHfIv9OMT2ks5uqYPlgaJpZM4X8l6W.

JimBiardCics commented 6 years ago

@ChrisBarker-NOAA @martinjuckes For the moment, let's set aside the question of names for the calendars.

There is nothing at all wrong with specifying that the epoch time stamp in the units attribute always be a correct UTC time stamp. In fact, allowing the epoch time stamp to be from a TAI or UTC clock will increase the chances that the data will be handled incorrectly. If you are sophisticated enough to care about TAI, you will have no problem dealing with a UTC time stamp.

I am explicitly assuming that all UTC time stamps are correct and accurate at the times they were acquired or constructed. If you are getting your time stamp from a PC that isn't actively synced by a time server, you shouldn't bother to use either of these new calendars.

When you read time out of a GPS unit, you can get a count of seconds since the GPS epoch, and I believe you can get a GPS time stamp that doesn't contain leap seconds (like TAI time stamps, but with a fixed offset from TAI), but most people get a UTC time stamp. The GPS messages carry the current leap second count and receivers apply it by default when generating time stamps. That's something I learned about from Aaron Sweeney a year or two ago - well after the big discussion we had about all of this a few years back.

There are quite a lot of high-precision data acquisition platforms out there that start with accurate UTC time stamps obtained from GPS receivers. Many of them don't care about metricity. They just want their time stamps, but CF tells them that they must store time as elapsed time since an epoch.

There's not really such thing as an elapsed time that is UTC vs TAI. At core, true elapsed time - a count of SI seconds since an event - is the same for both. The UTC time stamp for that event may not be identical to the TAI time stamp for that same event, but they both reference the same event, and the elapsed time since that event is the same no matter which time system you are using.

The UTC time system provides a prescription in terms of leap seconds for how to keep UTC time stamps synchronized with the rotation of the earth. Just like the Gregorian calendar system provides a prescription in terms of leap days for how to keep date stamps synchronized with the orbit of the earth. The only difference - and it is an admittedly important one - is that the UTC leap second prescription does not follow a fixed formula like the Gregorian leap day prescription does. The UTC time system declares that the time stamp 1958-01-01 00:00:00 references the same time as the matching TAI time stamp. The TAI time system provides no prescription for keeping TAI time stamps synchronized with the rotation of the earth.

Time stamps - whether UTC, TAI, GPS, or something else - are, in essence, a convenience for humans. No matter what time system or calendar system you use, the number of seconds or days that has elapsed since any given date and time is the same.

In a perfect world, all data producers would use a leap-second-aware function to turn their lists of time stamps into elapsed time values and all time variables would be "perfect". That would also force all users of those time variables to use a leap-second-aware function to turn the elapsed time values into time stamps. But that's not the world we live in. Does naive conversion of UTC time stamps into elapsed times have the potential to produce non-monotonic time coordinate variables that violate the CF conventions? Yes. Does it cause any real problems (for the vast majority of cases and instances of time) if people use this "broken" method for encoding and decoding their time stamps? No.

At the end of the day, it doesn't matter much what process you used to create your elapsed time values. For the small number cases where differences of 1 - 37 seconds matter, we are trying to make a way for data producers to signal to data users how they should handle the values in their time variables while staying within the existing CF time framework and acknowledging CF and world history regarding the way we deal with time (which isn't very consistent and is composed of successive corrective overlays over centuries).

martinjuckes commented 6 years ago

Hello @JimBiardCics , thanks again for a detailed response.

I support the aim of allowing users to place either a UTC or TAI time stamp in the units statement (so that they can do whatever fits with their existing processes) and making it possible for them to declare which they are using. The suggestion of using the calendar attribute for this makes sense.

I think we are agreed now that there is a unique and well defined mapping between these time stamps, and there is a unique and well defined way of calculating an elapsed time (in the SI sense) between any two such time stamps. I don't see how the layers of complexity need to come into this. The TAI time stamp counts up with 86400 seconds per TAI day, while the UTC has a known selection of days with an extra second in the final minute.

All we can do is define these things clearly, we can't force users to adopt best practice. As you say, some people don't have accurate clocks, just as some record temperature without accurate thermometers.

I would disagree with you on one point: a non-monotonic data array is no problem, but a non-monotonic coordinate array is a problem. As Chris commented, people who need to know about this sort of problem are likely to have sorted it out before they get around to writing NetCDF files.

ChrisBarker-NOAA commented 6 years ago

@JimBiardCics wrote:

For the moment, let's set aside the question of names for the calendars.

OK, though a bit hard to talk about :-)

In a perfect world, all data producers would use a leap-second-aware function to turn their lists of time stamps into elapsed time values and all time variables would be "perfect".

almost -- I think TIA time is also a perfectly valid system for a "perfect" world :-)

But yeah, most datetime libraries do not handle leap seconds, which, kida ironically, means that folks are using TIA time even if they think they are using UTC :-)

Does naive conversion of UTC time stamps into elapsed times have the potential to produce non-monotonic time coordinate variables that violate the CF conventions? Yes. Does it cause any real problems (for the vast majority of cases and instances of time) if people use this "broken" method for encoding and decoding their time stamps? No.

I'm not so sure -- I think having a time axis that is "non-metric" as you call it can be a real problem. Yes, it could potentially be turned into a series of correct UTC timestamps by reversing the same incorrect math used to produce it, but many use cases are working the the time access in time units (seconds, hours, etc), and need it to be nifty things like monotonic and differentiable, etc.

we are trying to make a way for data producers to signal to data users how they should handle the values in their time variables while staying within the existing CF time framework and acknowledging CF and world history regarding the way we deal with time

Fair enough -- a worthy goal.

There is nothing at all wrong with specifying that the epoch time stamp in the units attribute always be a correct UTC time stamp. In fact, allowing the epoch time stamp to be from a TAI or UTC clock will increase the chances that the data will be handled incorrectly. If you are sophisticated enough to care about TAI, you will have no problem dealing with a UTC time stamp.

I disagree here -- the truth is that TAI is EASIER to deal with -- most datetime libraries handle it just fine in fact, it is all they handle correctly. So I think a calendar that is explicitly TAI is a good idea.

I think we are converging on a few decisions:

1) Due to legacy, uninformed users, poor library support, and the fact that it just doesn't matter to most use cases, we will have an "ambiguous with regard to leap seconds" calendar in CF. Probably called "gregorian", because that's what we already have, and explicit or not, that's what is means with existing datasets. So we need some better docs here.

2) Do we need an explicit "UTC" calendar, in which leap seconds ARE taken into account. The file would only be correct if the timestamp is "proper" UTC, and you would get the right (UTC) timestamps back if and only if you used a leap-second-accounting for time library. The values themselves would be "metric" (by Jim's definition)

3) Do we need an explicit "TAI" calendar. The file would only be correct if the timestamp is "proper" TAI, and you would get the right (TAI) timestamps back if and only if you did not apply leap seconds. The values themselves would be "metric" (by Jim's definition).

Note that the only actual difference between (2) and (3) is that the timestamp is in UTC or TAI, which are different since some time in 1958, but up to 37 seconds. In either case, the values themselves would be "proper", and you could compute differences, etc easily and correctly.

4) minor point -- do we disallow "days" in any of these, or be happy with 1day == 24 hours == 86400 seconds. I'm fine with days defined this way -- it is almost always correct, and always what people expect. (though it could cause issues, maybe, with some datetime libs, but only those that account for leap-seconds, so I doubt it)

5) I think this is the contentious one: Do we have a calendar (encoding, really) that is:

Elapsed time since a UTC timestamp, but with elapsed time computed from a correct-with-regard-to-leapseconds UTC time stamp with a library that does not account for leap seconds. This would mean that the values themselves may not be "metric".

I think this is what Jim is proposing.

(by the way, times into the future (when leap-seconds are an unknown) as of the creation of the file should be disallowed)

Arguments for (Jim, you can add here :-) )

people are already creating time variables like this -- it would be nice ot be able to explicitly define that that's what you've done, so folks can interpret them exactly correctly.
since a lot of instruments., computers, etc. use UTC time with leap seconds applied, and most tiem processing libraries don't support leap seconds -- folks will continue to produce such data, and, in fact have little choice but to do so.

Arguments against:

This is technically incorrect data -- it says "seconds since", but it isn't actually always seconds since. We should not allow incorrect data as CF compliant. Bad libraries are not CF's responsibility.
A time axis created this way will be non-"metric" - that is, you can't compute elapsed time correctly directly from the values -- this is likely to lead to confusion, but worse still, hard to detect hidden bugs -- that is, code that works on almost any dataset might suddenly fail if a value happens to fall near a leap-second, and you get a zero-length "second" (or even a negative one? -- is that possible).
(same as above, really) -- a time variable of this sort can only be used correctly if it is first converted to UTC timestamps.
There may be issues with processing this data with some (most?) time libraries (in particular the ones that don't account for leap-seconds). This is because if you convert to a UTC timestamp with leap-seconds, you can get a minute with 60 seconds in it, for example:

December 31, 2016 at 23:59:60 UTC

And some time libraries do not allow that.

Example python's datetime:

In [3]: from datetime import datetime

In [4]: datetime(2016, 12, 31, 23, 59, 60)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-a8e1ba1d62e5> in <module>()
----> 1 datetime(2016, 12, 31, 23, 59, 60)

ValueError: second must be in 0..59

Given these trade-offs, I think CF should not support this -- but if others feel differently, fine -- but do not call it "UTC" or "TAI"! -- and document it carefully!.

That last point is key -- this entire use-case is predicated on the idea that folks are working with full-on-proper-leap-second-aware UTC timestamps, but processing them with a non-leap-second-aware library -- and that this is a fully definable and reversible process. But at least with one commonly used datetime library (Python's built-in datetime). It simply will not work for every single case -- it will work for almost every case, so someone could process this data for years and never notice, but it's not actually correct! In fact, I suspect most computer systems can't handle: December 31, 2016 at 23:59:60 UTC, and will never give you that value -- rather, (IIUC) they accommodate leap seconds by resetting the internal clock so that "seconds since the epoch" gives the correct UTC time when computed without leap seconds. But that reset happens at best one second too late (so that you won't get that invalid timestamp).

All this leads me to believe that if anyone really cares about sub-second-level precision over a period of years, then they really, really should be using TAI, and if they THINK they are getting one-seconds precision, they probably aren't, or have hidden bugs waiting to happen. I don't think we should support that in CF.

Final point:

When you read time out of a GPS unit, you can get a count of seconds since the GPS epoch, and I believe you can get a GPS time stamp that doesn't contain leap seconds (like TAI time stamps, but with a fixed offset from TAI), but most people get a UTC time stamp. The GPS messages carry the current leap second count and receivers apply it by default when generating time stamps.

OK -- but I suspect that yes, most people get a UTC timestamp, and most people don't understand the difference, and most people don't care about second-level accuracy over years.

The over years part is because if you have, say, a GPS track you are trying to encode in CF, you should use a reference timestamp that is close to your data -- maybe the start of the day you took the track. So unless you happen to be collecting data when a leap second occurs, there will be no problem.

For those few people that really do care about utmost precision -- they should use the TAI timestamp from their GPS -- and if it's a consumer-grade GPS that doesn't provide that -- they should get a new GPS! It's probably easier to figure out how to get TAI time from a GPS than it is to find a leap-second-aware time library :-)

Side note: Is anyone aware of a proper leap-second aware time library??

Sorry for the really long note -- but I do think we are converging here, and I won't put up a stink if folks want to add the sort-of-utc calendar -- as long as it's well named and well documented.

-CHB

JimBiardCics commented 6 years ago

@martinjuckes Let me make something clear that seems to have been unclear. I am not proposing that the new calendars are declaring what sort of epoch time stamp is in the units attribute of time variables.

I think it is a bad idea to allow a TAI time stamp to be placed in the units attribute as the epoch for the elapsed times in the time variable. It adds nothing but confusion. The epoch should always be specified with a UTC time stamp. This causes no problems whatsoever. Someone going to the trouble of getting exact and correct elapsed times in their time variables will have no trouble figuring out the proper UTC time stamp for their epoch. Users, on the other hand, would be faced with figuring out which way to handle the epoch time stamp. If the user is aware of the implications of having exact and correct elapsed times in a time variable, they will also have the software on hand that is needed if they want to get time stamps from the time variable, or they will get it because it is important to them. If they aren't aware, a TAI epoch time stamp will maximize the error they will get if they perform a leap-second-naive conversion to get time stamps from the time variable contents and mistakenly think are getting a UTC time stamp.

When I was talking about monotonicity I was intending it in reference to coordinate variables. I disagree with Chris (whether that is @ChrisBarker-NOAA or the unknown Chris who commented through the CF Metadata List account). People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so. It may be that they avoid the problem by breaking their data into files on UTC day boundaries (or hourly boundaries, etc), but if you aggregate those files you will create non-monotonic time variables. They will continue to do so because the potential one second per year non-monotonicity is less of a problem than forcing all their users to start decoding time variables into time stamps with leap-second-aware functions that are not readily available.

As I said before, this is a compromise position. Telling a significant group of data producers that they must change their software in a way that will cause widespread problems for their users is a good way to get them to ignore you.

JimBiardCics commented 6 years ago

On another front, I'm perfectly happy to entertain different names for the new calendars. I'm not a particular fan of gregorian_tai or gregorian_utc**.

ChrisBarker-NOAA commented 6 years ago

People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so.

Agreed -- and we need to accommodate that -- which we already do implicitly with "gregorian", and we should make it explicit in the sense that the whole leap-second thing is ambiguous.

Telling a significant group of data producers that they must change their software in a way that will cause widespread problems for their users is a good way to get them to ignore you.

Indeed. The question at hand is do we codify as correct what is, at best, a kludgy improper process?

As we've all said, the ambiguity is a non-issue for the vast number of CF use cases.

So the question really is -- are there folks for whom:

Leap-seconds matter
They know and understand the issues of UTC vs TAI, etc.
They have no way to produce a "correct" time variable

I honestly don't know -- but it's been suggested that we can use UTC always as the timestamp, 'cause anyone that understands the distinction (and cares about it) will have access to a leap-second-aware-library. In which case, no -- there isn't a large user base.

But given the dirth of leap-second aware libs, maybe we have no choice to to codify this kludgy encoding.

martinjuckes commented 6 years ago

Hello @JimBiardCics : sorry for my confusion. I thought the idea was to make it possible for people to encode information relative to a TAI time. At the moment declaring a CF calendar as julian or gregorian changes the interpretation of the reference time and I though we were discussing something similar here for TAI, i.e. defining one calendar with fixed calendar days of 86400 seconds and a 2nd calendar in which some calendar days have an extra second.

If this is not your proposal, I'm afraid I still don't understand what it is that you are proposing.

I would like to keep the idea of having a calendar with a fixed 86400 second day in discussion, because I believe this is what the climate modelling community use and it will provide a route to reducing confusion about leap seconds make it possible to encode this information precisely.

JimBiardCics commented 6 years ago

@ChrisBarker-NOAA

People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so.

Agreed -- and we need to accommodate that -- which we already do implicitly with "gregorian", and we should make it explicit in the sense that the whole leap-second thing is ambiguous.

As I look at it, the case I am describing above is not the same as the "I don't know about all this and it doesn't matter to me" case. These are people for whom precise and accurate time matters to the level they are taking it, but they would be perfectly happy with storing the time stamps as strings instead of converting to elapsed time since an epoch. But CF mandates elapsed time since an epoch, so that is what they do. Their user base likely feels the same way. Precision is desired, but metricity is not an issue.

When I think about all of this, I find myself looking at two slightly different category lists. One is for producers, the other is for consumers.

For producers:

The distinctions are unimportant. (old gregorian is good enough)
Precision is important but metricity is not. The inputs to the netCDF creation process are accurate UTC time stamps. Using leap seconds to create fully metric and monotonic time variables imposes a burden on users to find software that handles leap seconds properly. (new calendar a)
Precision is important and metricity is important. The kind of input to the netCDF creation process doesn't matter, because they are going to do whatever it takes to generate accurate time variables. Their target users are going to do whatever it takes to use the time properly. (new calendar b)

For consumers:

The distinctions are unimportant. (old gregorian is good enough)
Precision is important but metricity is not. As long as they can get accurate UTC time stamps they are happy. They would rather not deal with leap seconds if they don't have to. (new calendar a)
Precision is important and metricity is important. They likely want to do math with the elapsed time values themselves. Either new calendar works for them, because they are going to do whatever it takes to get what they need as long as they know what is in the time variable. (new calendar a is best, but new calendar b can be made to work)

I agree wholeheartedly that the ideal scenario would be to insist that the contents of every time variable be fully metric and monotonic under all circumstances. I think the practical reality is that the vast majority of both producers and consumers fall in their respective categories 1 and 2. As kludgy as it is, I think it is a good idea to allow new calendar a (the calendar formerly known as gregorian_utc) as well as new calendar b (the calendar formerly known as gregorian_tai).

JimBiardCics commented 6 years ago

@martinjuckes My perspective is that we will continue to honor SI seconds and days, weeks, and fortnights (2 weeks) as integer multiples of SI seconds. The thing that can be quite confusing in all this is that the number of seconds that has elapsed since any given epoch is the same in both the UTC and TAI time systems. The time stamps for the epoch may be different, but the elapsed time is always exactly the same. The same is true of 360-day, lunar, 365-day, Julian, and Gregorian calendars, when used in the real world. The date stamp for a given epoch will be different in each, but the number of days that have elapsed since that epoch will be the same in all the calendars.

Leap days and leap seconds are a mechanism to keep the date and time stamps synchronized with the orbital and rotational motion of the earth relative to the sun. Adding leap days to date stamps keeps the Vernal Equinox to within one day of the same month and day. Adding leap seconds to time stamps keeps midnight at 0 degrees longitude to within one second of the same hour, minute, and second. Leap days and seconds are only about time stamps, not about how much time has passed. The Gregorian calendar says, "Let's keep the months and days lined up with the seasons." The UTC time system says, "Let's also keep the hours and minutes lined up with the solar days." The TAI time system says, "Let's let the hours and minutes do their own thing and slip relative to solar days."

When we had this discussion on the CF metadata listserv a few years ago, I started out advocating for a new calendar that would be entirely TAI, so epoch time stamps would be in TAI. This meant we would need three new calendars instead of two. I became convinced that the side effects of this would cause more problems than it would solve.

Users who didn't realize that the TAI epoch time stamps were not UTC (or didn't go to the trouble of getting leap-second-aware software to get time stamps from the time variable) would have maximal discrepancy between the times they thought they had and the actual times (different amounts in different years up to 37 seconds for data from 2017 and 2018). Data producers who want to store monotonic, metric elapsed times in their time variables won't find it hard to generate accurate UTC time stamps for their time variable epochs, so there didn't seem to be a compelling case for giving people the option of going "full TAI".

martinjuckes commented 6 years ago

However, calendar formerly known as gregorian_utc, as described at the start of this discussion, is not consistent with use of SI time units, and so could not be used with standard name time. What is described here has no relation to the concept of calendar as it currently exists in the CF convention. I can't see how such a major step in increased complexity can be justified.

Why are you not prepared to consider working with the structures we have?

JimBiardCics commented 6 years ago

@martinjuckes How is the calendar formerly known as gregorian_utc not consistent with use of SI time units?

martinjuckes commented 6 years ago

hi @JimBiardCics .. my previous comment (2 above) was a response to your post 4 above, before I saw your comment addressed to me (3 above) ...

Firstly, in response to your most recent question: If you add 120 seconds to 2016-12-31T23:59:00Z (UTC) you get 2017-01-01T00:00:59Z (UTC) because of the leap second. You appear to be suggesting that 2017-01-01T00:00:59Z could be encoded with 119 unit_formerly_known_as_seconds since 2016-12-31T23:59:00Z in the case of the calendar formerly known as gregorian_utc. The unit_formerly_known_as_seconds is not the same as SI seconds. 120 seconds has a unique meaning for all other calendars which is not modified in any way by the choice of calendar. A system in which two physical seconds map onto one unit of measure is introducing substantial new complexity.

I didn't realise that you has previously considered the introducing a TAI-timestamped calendar. I don't understand your conclusion that this would require 3 new calendars. To be, it appears that we would want two versions of gregorian, one with leap seconds (i.e. UTC) and the other without (fixed minutes, as used for TAI). Can't we get away with one new calendar, and a clarification of the definition of gregorian?

If I've understood correctly (sorry if I've been a bit slow) you are concerned about users who have a sequence of UTC time stamps such as:

2016-12-31T23:59:40Z
2016-12-31T23:59:50Z
2016-12-31T23:59:60Z
2017-01-01T00:00:10Z and would like to encode them as times in seconds since 2016-12-31T23:59:40Z. The correct answer would be [0,10,20,31]. If their software is not leap-second aware they might get [0,10,20,30], but they could also get [0,10,0,30] (60 seconds identified as 0) or [10,10,NaN,30] (software fails silently on seeing 60 seconds).

Rather than create a kludgy fix in the standard, couldn't we provide people with clearer guidance? The leap seconds are not that complicated .... though it took be a while to get the information and it be useful to make that bit easier for our users, e.g. the authorise IETF list of leap seconds is here .. and this list will be updated at least 6 months before a new leap seconds is introduced.

ChrisBarker-NOAA commented 6 years ago

@martinjuckes wrote:

Rather than create a kludgy fix in the standard, couldn't we provide people with clearer guidance? The leap seconds are not that complicated .... though it took be a while to get the information and it be useful to make that bit easier for our users, e.g. the authorise IETF list of leap seconds is here .. and this list will be updated at least 6 months before a new leap seconds is introduced.

I have to agree with Jim here -- guidance is not the problem -- UTC is well defined, and as you say the leap seconds are published, and the math is not that hard. The problem is libraries: no datetime library that I know of handles leap seconds properly (OK -- I just googled, how hard is that :-) -- the BoostDatetime lib does handle leap seconds). So I'll rephrase, "no datetime library commonly used by the CF community" handles leap seconds.

As a community, maybe we could do something about that, but as a data standard, we're stuck with the state of practice as it is.

ChrisBarker-NOAA commented 6 years ago

As I look at it, the case I am describing above is not the same as the "I don't know about all this and it doesn't matter to me"

I agree -- THAT case is currently handled (defacto) by "gregorian", and I think we all agree that we should clarify that in the docs.

My concern is the "It matters to me, but I don't really understand exactly what the systems and libraries I'm working with do and what their limitations are" use case.

And I think those folks would be poorly served by a CF standard that appeared to support their use case, but might give incorrect results.

For the "I really know exactly what my requirements are, and I know that the tools I use satisfy them" crowd -- they can use proper TAI or a datetime library that handles leap seconds.

As for the distinction between producers and consumers -- that is a big issue -- consumers are likely to be less knowledgeable, so you do want a way to tell them what to do with the tools they have access to. So the idea of a way to tell them: "decode this with a non-leap-second-aware library and you will get what you expect makes sense.

@JimBiardCics wrote:

they would be perfectly happy with storing the time stamps as strings instead of converting to elapsed time since an epoch. But CF mandates elapsed time since an epoch, so that is what they do. Their user base likely feels the same way. Precision is desired, but metricity is not an issue.

Funny you should say this -- on the bus this morning, I was thinking about this, and realized exactly this -- the use case is:

I have a bunch of UTC timestamps (e.g. 2018-11-01T12:15:32)
I want to store them in a time variable in a CF compliant way
I want users of that file to get exactly the same timestamps back when they use the data.

And the fact is that without the existence of leap seconds in commonly available libraries, it's actually hard to do that properly with the CF time-elapsed-since-a-timestamp.

So the solution to this is to provide, in CF, a way to encode timestamps directly. ISO8601 strings are probably the easiest way to do it, but you could store them in a more compact binary format if you like (like the python datetime object does internally).

I was part of the conversation a while ago about doing this, and did not support it because it seemed like simply another way to encode the same information. But I've come to the conclusion now that it is NOT the same information with a different encoding.

Storing timestamps directly (as a string or other format) is a way to store not a time axis (with nice metric properties, etc), but to store, well, timestamps -- like "this measurement was taken at this date and time".

So my personal opinion:

1) CF should accommodate the use-case Jim is advocating for.

2) The best way to do that is to have a format for storing timestamps -- maybe we could even call it "timestamp" or something, rather than "time".

3) If (2) is not acceptable to the CF community, then JIm's proposal is probably the only practical solution.

NOTE: I have come around because I've come to a new understanding of the problem we are trying to solve. I thought it was:

"We need a way for people to store data that are working with inappropriate tools for their use-case, that will create invalid data in the CF file, but that will probably be able to be recovered with standard tools"

From that perspective, you can probably see why I thought it was not a good idea.

Now I think the problem we are trying to solve is:

We need a way for people to store precise-to-the-leap-second UTC timestamps such that they and their users will get the exact same timestamps back when processing the file"

I imagine most of you will agree that that is a reasonable use-case for CF.

Jim's proposal is a way to accommodate this use-case using the current CF approach to encoding datetime and commonly available tools. However, it is also a way that is prone to confusion, errors and hidden bugs.

Thus I propose a way to directly store timestamps, which will address the problem at hand, in a clear and much less error prone way.

I want to thank Jim for his persistence in continuing to push on this in the face of my intransigence :-)

Final thought -- we have also brought up issues in this thread about whether to properly support TAI, etc -- maybe a new issue is in order for that?

JimBiardCics commented 6 years ago

@martinjuckes Providing concrete examples like you did is quite useful. I should probably have done so myself well before this. My apologies.

Let's say I am acquiring data once every 10 seconds using a GPS receiver to get UTC time stamps. Around the last leap second I will have:

2017-12-31T23:59:40Z
2017-12-31T23:59:50Z
2017-12-31T23:59:60Z
2017-01-01T00:00:09Z
2017-01-01T00:00:19Z
2017-01-01T00:00:29Z

This represents the worst case scenario.

Here is what my time variable will contain using the two new calendars:

Time variable using new calendar a (UTC converted to elapsed time without taking leap seconds into account)
- units = "seconds since 2017-12-31 23:59:40"
- data = [ 0, 10, 20, 29, 39, 49 ]
Time variable using new calendar b (UTC time stamps converted to elapsed time taking leap seconds into account)
- units = "seconds since 2017-12-31 23:59:40"
- data = [ 0, 10, 20, 30, 40, 50 ]

In the first case the time variable encodes an error. This is what makes the contents non-metric. Comparison of the intervals before and after the leap second yields a value of 10 seconds, but the center interval is 9 seconds. It's not that the seconds aren't SI seconds. It's that we have encoded an error. In the second case, the time variable contents are fully metric.

What happens if we extract UTC time stamps from these two different time variables? We get:

Time variable using new calendar a (Converted to UTC time stamps without taking leap seconds into account)
- 2017-12-31T23:59:40Z
- 2017-12-31T23:59:50Z
- 2017-01-01T00:00:00Z - error here
- 2017-01-01T00:00:09Z
- 2017-01-01T00:00:19Z
- 2017-01-01T00:00:29Z
Time variable using new calendar a (Converted to UTC time stamps taking leap seconds into account - requires custom software)
- 2017-12-31T23:59:40Z
- 2017-12-31T23:59:50Z
- 2017-12-31T23:59:60Z
- 2017-01-01T00:00:09Z
- 2017-01-01T00:00:19Z
- 2017-01-01T00:00:29Z
Time variable using new calendar b (Converted to UTC time stamps without taking leap seconds into account)
- 2017-12-31T23:59:40Z
- 2017-12-31T23:59:50Z
- 2017-01-01T00:00:00Z - error here
- 2017-01-01T00:00:10Z - error here
- 2017-01-01T00:00:20Z - error here
- 2017-01-01T00:00:30Z - error here
Time variable using new calendar b (Converted to UTC time stamps taking leap seconds into account)
- 2017-12-31T23:59:40Z
- 2017-12-31T23:59:50Z
- 2017-12-31T23:59:60Z
- 2017-01-01T00:00:09Z
- 2017-01-01T00:00:19Z
- 2017-01-01T00:00:29Z

In the first case, we get back (except on the leap second) the time stamps that were the inputs to the variable. There is an error in the results, but notice that if the times were each one second later, you wouldn't see any error in the results.

In the second case, the savvy user understands from the calendar used that there will be an error in the time stamp on the leap second. That user captures that special case with custom software and fixes the one time stamp.

In the third case, the unfortunate user doesn't pay attention to what it means to have a variable that uses new calendar b and converts the metric elapsed times to UTC without considering leap seconds. They get a one second error that propagates. But they ignored the calendar, so we can't really help them.

In the fourth case, the savvy user understands from the calendar used that the elapsed times are metric, and that if she wants to get an accurate UTC time stamp, she must use a process that is aware of leap seconds. She does so, and gets fully correct results back.

The reason for the two different new calendars is that the first, erroneous time variable produces correct results (with one exception) if the users are only interested in getting time stamps. If they convert the values back to UTC time stamps without taking leap seconds into account, they will get back the original inputs the vast majority of the time. And that is well sufficient for a large number of users.

The second, "correct" time variable produces correct results if the user is savvy. Notice that the savvy user can also get correct results from the first time variable if they want to. The savvy user can also get exact elapsed times from the first variable by adding leap seconds as needed if they need exact elapsed time values. The only truly bad scenario here is the one in which an un-savvy user tries to get UTC time stamps from the "correct" time variable without taking leap seconds into account.

This is why I have proposed the two new calendars. They allow data producers to signal to savvy users how to use the contents. new calendar a provides maximum accommodation of the needs of less savvy users without posing a big problem for savvy users that need more. new calendar b allows data producers to be rigorous and exact, and is better from that standpoint. It makes life easier for savvy users, but less savvy users might make a mistake that will lead to erroneous time stamps. But we can't save people from themselves. We can warn them, and I think we should write enough about all of this in the CF Conventions document to make it clear, but there are plenty of people out there that don't read the manual. I deal with them on a regular basis.

ChrisBarker-NOAA commented 6 years ago

OK, another thought (sorry):

@martinjuckes wrote:

However, calendar formerly known as gregorian_utc, as described at the start of this discussion, is not consistent with use of SI time units, and so could not be used with standard name time. I can't see how such a major step in increased complexity can be justified.

I think this is being kind pedantic: in this case, a second is still a second, but the times are not (to use Jim's word) "metric". I agree that the raw values probably shouldn't be used as a usual time axis. This is what I call "prone to hidden errors" rather than "not SI units" :-)

What is described here has no relation to the concept of calendar as it currently exists in the CF convention.

au contraire: a "calendar" is a way to translate from time-passed-from-a-timestamp to another timestamp. (timestamp being year,day, month, minute, second) -- so I think it fits.

Why are you not prepared to consider working with the structures we have?

It's actually the opposite -- we have a real use-case to solve, and Jim's proposal is a way to jam it inot the existing structures. Hence my proposal to create a new structure.

JonathanGregory commented 6 years ago

Dear Jim, Martin, Chris et al.

Thanks for this discussion and specially many thanks to Jim for spending the time to make a proposal. It was indeed an epic discussion on the email list, and it's great to see an outcome.

I agree with the proposal as Jim has made it, if I understand it correctly, except that I agree with Martin that the reference time in the time units should be given in the calendar specified by the calendar attribute. If we have calendar="julian", zero "days since 1917-10-25" means 7 Nov 1917 in the gregorian calendar (which is "proleptic" UTC). It does not mean 25 Oct 1917 in the gregorian calendar.

In CF, as Chris says, a "calendar" is a way to translate from time-since-a-timestamp to another timestamp (timestamp being year, day, month, minute, second). The calendar attribute identifies the rules for converting between a time coordinate (with time units) and a timestamp (components of time). A crude way to picture these "rules" is that a calendar is defined by an ordered list of all the valid times. For the current set of CF calendars, it would be sufficient to list all the valid dates (YYYY-MM-DD) in the calendar, which have a spacing of 1 day (86400 s), but with the complication of leap seconds we have to regard the calendar as being defined by an ordered list of all the valid timestamps (YYYY-MM-DD HH:MM:SS) in the calendar, which have a spacing of 1 second. The reference time in the time-units is one of these timestamps. To work out the time coordinate for a given timestamp, you count how many valid timestamps there are between the reference and the one you want.

Martin's interpretation of the reference time would modify Jim's proposal slightly, according to my understanding, which is as follows:

gregorian_tai. The reference date in the time-units is TAI, not UTC. The calendar has all the Gregorian dates but no leap seconds.

gregorian_utc. The reference date in the time-units is UTC. The calendar has all the Gregorian dates and includes leap seconds.

gregorian. The reference date in the time-units is UTC, if it's real-world data. The calendar has all the Gregorian dates but no leap seconds, so the rules for converting between time coordinates and timestamps are the same as for gregorian_tai. The timestamps can be encoded and decoded successfully by these rules but the difference between two time coordinates which span a leap second will be incorrect. That is, the time coordinates are not "metric", in Jim's sense. As Jim has explained, most real-world data is probably like this, because the timestamp comes from a clock which is synchronised to UTC but it was converted to a time coordinate according to the no-leap-second rules. Moreover, there is a lot of climate model data which uses this calendar, and it's correct because there are no leap seconds in the model world. It's not really "UTC" but it's what the model uses. In the model world, the time coordinates are metric.

I support introducing gregorian_tai and gregorian_utc for real-world applications which require time coordinates that take account of leap seconds. Should it be prohibited to use these two new calendars for dates before 1958-1-1, or are they gregorian before that date? I assume it should be prohibited to use gregorian_utc for dates in the future, since the leap seconds are unknown.

Best wishes

Jonathan

JimBiardCics commented 6 years ago

@JonathanGregory I think the new calendars, as you have described them, are not what I am proposing, and I don't think it is the direction we should go. I think the names were causing confusion in the discussion, and I get the impression that it has influenced your thinking too. This is why I started referring to the calendars by generic names - new calendar a and new calendar b.

Are you assuming that time values in the time variables for both of the new calendars will be fully metric (no non-linearities or unexpected offsets)?

jswhit commented 6 years ago

Is gregorian_utc a proleptic gregorian calendar, or is it the same as gregorian (mixed Julian/gregorian) with a known relationship with UTC?

JimBiardCics commented 6 years ago

Here's a bit of comic relief on this topic, courtesy of XKCD. https://xkcd.com/2050/

JimBiardCics commented 6 years ago

@jswhit There are no leap seconds prior to - depending on your viewpoint - somewhere between 1958 to 1970, so gregorian_utc (new calendar a) collapses to gregorian for dates prior to that range. I have no preference over where that should be considered proleptic or not. I tend to think that a time series using the proleptic Gregorian calendar would not care about UTC as such.

What do you think?

JimBiardCics commented 6 years ago

Here's a good compendium of information about time systems.

https://www.ucolick.org/~sla/leapsecs/timescales.html

jswhit commented 6 years ago

After reading that link I can appreciate that XKCD comic even more :-)

The cftime utility that I've worked on is quite primitive and will probably never be able to serve applications that require accounting for leap seconds. I agree that it is important to have some sort of calendar designation that signifies that this precision is needed, so that tools such as mine can return an error message, instead of an incorrect answer. gregrorian_tai or proleptic_gregorian_tai seem like perfectly acceptable names to me.

ChrisBarker-NOAA commented 6 years ago

The cftime utility that I've worked on is quite primitive and will probably never be able to serve applications that require accounting for leap seconds.

😪

We really could use a lib that does this — though I understand, it’s not my itch to scratch either.

jswhit commented 6 years ago

In python, it seems like something could be built around astropy.

JimBiardCics commented 6 years ago

The code is pretty straightforward. The biggest issue is dealing with the leap second list that you need to go get periodically in order to stay current. In Python it could easily subclass from datetime.datetime.

ChrisBarker-NOAA commented 6 years ago

It seems this issue has become a bit entangled with multiple problems in search of a solution— they are all related to calendars, but also could each be independently implemented. Maybe separate issue for each?

In the meantime, I think these are what’s on the table:

calendar(s) used for climate modeling — 360 days per year.
- proposed solution: we come up with a name and definition and we’re done.
- question: do climate modelers use “time stamps” that look like the Gregorian calendar? That is: March 15, 2025? (And of course, potentially February 30, 2100)? If so then this is all pretty straightforward. If, on the other hand, they use another system, like year 2025, day 154. Then we should have a way to directly encode that.
- is there any ambiguity in what “2016-06-15” means? Does anyone care about that ambiguity? That is, saying a timestamp is UTC clearly identifies it, but what does a timestamp in 360_day_year calendar mean? Can it be converted to UTC?

OK, that’s enough for a phone...