cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
84 stars 43 forks source link

Clarification of time coordinates, especially leap seconds, define `utc` and `tai` calendars and `leap_seconds` in `units_metadata` #542

Open JonathanGregory opened 4 days ago

JonathanGregory commented 4 days ago

Summary

This proposal aims to reorganise and clarify the existing text, mostly in section 4.4, about time coordinates, with no change in meaning. It includes a new subsection on leap seconds and their implications for the CF standard calendar, with examples and a diagram, and defines a new use of the units_metadata attribute to remove ambiguity in the interpretation of leap seconds in the standard calendar. It introduces two new CF calendars: utc for UTC with leap seconds properly accounted for, and tai for atomic clock time, used for some satellite data.

Benefits

Several previous lengthy but inconclusive CF discussions have shown that the treatment of leap seconds is unclear and unsatisfactory. In this proposal we hope to provide an acceptable solution to these difficulties.

Moderator

None yet

Associated pull request

541

Detailed Proposal

A huge amount of hard thought has been spent on previous long discussions about CF calendars and leap seconds (including #148, discuss issue #297, Discussion #304). The last of these went quiet in April.

Since then, we (@davidhassell and @JonathanGregory) have been working on a proposal, on which we'd now like to invite comments. If you are interested, please look at our modified text, especially section 4.4 on time coordinates. You can find this in any of the following:

The main changes are these:

Previous discussions on these matters have evoked disagreements on principle which turned out to be irreconcilable by discussion in the issue, and no conclusion was reached. To avoid that outcome, we'd like to try a different method with the present proposal. If you find something in this proposal which you feel you couldn't possibly accept, even with modification, please say so in this issue. If anyone feels like that, we will convene a group to discuss the disagreements by video meeting, like we've done with a couple of other difficult issues. The group would be charged with reaching a resolution soon enough for some version of this proposal to be accepted for the next release, probably with a deadline in November. If that can't be done, we'll have to start again when someone has a new idea in future.

On the other hand, any suggestions, comments or concerns on clarity, presentation and details of the convention can probably be resolved by discussion in this usual way on this issue. We look forward to hearing what you think!

@JonathanGregory and @davidhassell

JonathanGregory commented 4 days ago

In discussion 304, @ChrisBarker-NOAA has given his support to this proposal (thanks, Chris). He writes:

My only real concern is that the UTC calendar is an "attractive nuisance", and there is very little software that handles it properly, and many people use "UTC" imprecisely. But the text is very clear about the leap seconds, so buyer beware, I guess.

Please could anyone who wants to comment on this proposal do so here in this issue, rather than in discussion 304. Thanks.

JonathanGregory commented 4 days ago

@ChrisBarker-NOAA has also made some comments on the PR (#541). I'm copying them here, because discussion of "substantive" points in a PR is awkward to follow subsequently. It's easier to have a single record in the issue. Marking typos etc. in a PR is fine, because they don't need discussion or reply.

I've usually seen this spelled datetime or date-time, rather than date/time. I think those forms are a little better. I'm not sure why, but date/time reads to me a bit like date or time, rather than a compound word.

I agree that "date/time" isn't ideal because "/" means "or", but I don't have a strong view on what we should write. We used "date/time" because it appears like that elsewhere in the convention document, especially chapter 7. If there is a consensus on a preferred way to write it, or a different term to use, we could change it throughout the document.

Regarding the sentence, "To mark this distinction, the canonical unit given for quantities used for time coordinates is s since 1958-1-1", just curious -- why 1958? I actually saw this in a file in the wild recently, and was wondering where in the heck it came from! I guess I'd expect 1970-1-1 [as that's the most common epoch used] as canonical, but it's not vital.

UTC and TAI have a complicated history, as described by wikipedia. My understanding is that, to summarise it simply, TAI began in 1958-1-1, with the modern definition of a second in terms of the caesium atomic clock. In 1972 UTC was rebased on TAI, in such a way that they were treated as coincident at 1958-1-1, with 10 leap seconds having been added by 1972. Hence it's convenient to regard UTC as beginning in 1958 as well as TAI. There is a sentence of explanation elsewhere in the CF text, which Chris discovered later. I will put something at the point where this remark was made as well.

[Where we discuss the definition of year and month: insert] "A day is exactly 24 hours (86400 sec). It is not a calendar day." I suggest this because in, e.g. the Python datetime library, a day is a calendar day, rather than 24 hours. I think that only makes a difference during a DST transition, which CF doesn't allow anyway (I hope!) -- but it wouldn't hurt to be extra clear here.

That's fine, thanks. I will insert it. The time zone definitions are plus/minus numbers hours (and minutes), not names - no automatic transitions are implied by them!

[Where we discuss time zones, replace "time zone" with] "time zone offset" -- time zone is the administrative thing, and has a name, and maybe DST transitions -- the timezone offset is the clear and simple.

OK, thanks.

[Concerning the new utc calendar, we have proposed "Date/times in the future are not allowed in this calendar, because it is unknown when future leap seconds will occur." Chris comments: ] I think some warning is given before a leap second is introduced -- so we could go a bit in the future (wikipedia says " leap seconds are announced only six months in advance.") -- but I can't find a formal reference for that -- so I guess ruling out the future altogether is probably wise.

In practice I'm sure it's OK if data-writers produce data for the future which they know it will be correct because of advance warning. The checker will give an error if it finds a date which is the future when the checker is run, but the future becomes the past at the rate of 1 second per second, and the same file will not give an error once that has happened! Should this be a recommendation not to write future UTC, rather than a prohibition?

Thanks for these comments, Chris. I have resolved them in the PR.

JonathanGregory commented 4 days ago

Dear Chris

I have made changes (in the PR, html and pdf) following your suggestions. Two of them were more complicated that I had expected. Here are the new versions of various paragraphs:

In 4.4.1

UDUNITS defines a minute as 60 seconds, an hour as 3600 seconds and a day as 86400 seconds. These are not calendar units. When civil clock time changes at the start and end of summer in many countries, the day according to its calendar date lasts for 23 or 25 hours, but the UDUNITS and CF day is always 24 hours. When a leap second is inserted into UTC, the minute, hour and day affected differ by one second from their usual durations according to clock time, but the UDUNITS and CF minute, hour and day do not; they are fixed units of measure.

The default time zone offset is zero. In a time zone with zero offset, time (approximately) equals mean solar time for 0 degrees_east of longitude. (Although this may be exact in a model, in reality the time with zero time zone offset differs by some seconds from mean solar time; see the discussion of UTC and leap seconds in <<4.4.2>>.) If both time and time zone offset are omitted the time is 00:00:00 (midnight, the start of the day). Thus, units = "days since 1990-1-1" means the same as units = "days since 1990-1-1 0:0:0".

For example, seconds since 1992-10-8 15:15:42.5 -6:00 indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon, in a time zone where the date/time is six hours behind the default. Subtracting the time zone offset from a given date/time converts it to the equivalent date/time with zero time zone offset e.g. 1989-12-31 18:00:00 -6 identifies the same instant as 1990-1-1 0:0:0.

In 4.4.2

In the real world, the international basis of civil timekeeping is Coordinated Universal Time (UTC). Leap seconds are adjustments occasionally made in UTC, in order to keep it close to mean solar time at 0 degrees_east i.e. the time zone with the default (zero) time zone offset in UDUNITS and CF (see <<4.4.1>>).

Do they look OK?

Cheers

Jonathan

ChrisBarker-NOAA commented 3 days ago

These look greatt -- thanks!

Where are we at with:

I agree that "date/time" isn't ideal because "/" means "or", but I don't have a strong view on what we should write. We used "date/time" because it appears like that elsewhere in the convention document, especially chapter 7. If there is a consensus on a preferred way to write it, or a different term to use, we could change it throughout the document.

I vote for either "datetime" or "date-time" -- but yes, it should be the same everywhere, so if this is too much churn, we can leave it as is.

Maybe wait to see if anyone else has a preference?

JonathanGregory commented 1 day ago

Dear @chris-little

Thanks for reviewing the PR. I am glad you found it clear. You commented

You might want to consider removing the word midnight, or replace it with midnight at 0 degrees longitude. It is a bit UK-centric. The ISO 8601 standard removed that word from its content some years ago.

Thanks for this point. I have qualified "midnight" with "at 0 degrees_east" in all the places I could find. It's updated in the PR, but I haven't updated the HTML and PDF.

Best wishes

Jonathan