JuliaLang / Juleps

Julia Enhancement Proposals
Other
67 stars 24 forks source link

Create UTC for Dates #36

Open stephancb opened 7 years ago

stephancb commented 7 years ago

A proposal to make Dates accept UTC leap seconds

stephancb commented 7 years ago

Introduction

Usually technical systems generate absolute time stamps in UTC. Julia Base DateTime cannot represent times that occur in leap seconds, which are from time to time inserted into UTC. Several other widely used representations of time have the same problem: Astronomical Julian days, time types of the C standard lib, Matlab day numbers, ... Practically it is quite a minor problem. Leap seconds are on average roughly only every 18 months. Most computers and mobile devices are automatically configured to use the NTP protocol and can circumvent leap seconds by halting the system clock for these times. This has greatly reduced the risk that leap second time stamps get generated.

Nevertheless the leap seconds seem to have been so disturbing for the Julia community, that the documentation for DateTime is now claiming not to handle UTC at all, but implementing UT, more precisely UT1. This is not a good idea, and I have submitted a PR to retract this claim. The conversion between UTC and UT1 is non-trivial, involves large tables, which are updated frequently, presently there is no support in any package, there are many more arguments ...

Rather the internal presentation of time in Base DateTime should be changed to allow for leap seconds, which is this proposal. Not always is halting the system clock, even for just 1 second, an option, like not on many of the satellites. Naturally solutions how to present time including leap seconds can be found in the area of space flight. This proposal is inspired by CDS described in a Blue Book, https://public.ccsds.org/Pubs/301x0b4e1.pdf which the reader may want to consult.

For calender time arithmetic Julia DateTime assumes that each day has 86400 s. With leap seconds this is not always the case, which at first glance seems disturbing. I don't think so. Calendar metric, for what it is used, does not need to agree precisely with physically elapsed seconds, and should continue to be done as it is now.

Day segmented time type

We are used to segment time for ourselves, into years, months, days, ... . Computers are typically instructed to use a single, unsegmented code for time, probably because we think, that it is most efficient in terms of computation and memory consumption. However, the UTC standard says, that a day has 86399, 86400, or 86401 seconds, and then a day segmented time presentation is natural.

An updated Julia Base DateTime would use instead of the millisecond counter

immutable UTCinstant
    day2000::Int32   # nr of days since 2000-01-01
    hus::Int32       # nr of 100 microsec on day
end

To mark that this is a new time code, I suggest to use 2000-01-01 for the epoch. Day numbers before 2000-01-01 are then negative, and we can represent back in time about 6 million years (and the same into the future).

With 32 bits for a (sub)seconds of day counter the smallest possible increment is 100 microseconds (us, 864009999<2^31-1). Using an unsigned wouldn't allow for 10x smaller. So with 32 bit signeds for the day counter and for a 100 us counter, the UTCinstant uses in total the same amount of memory and allows for a somewhat higher precision than the present millisecond counter. A computer's system clock stays synced to UTC within about 100 us, if there is a high precision NTP server on the same LAN (often at larger universities).

Date module interface

Constructors

Constructors of course will have to made accept leap second time stamps. They occur only on June 30 and December 31, which perhaps should be enforced in the constructors. But a lookup in a leap second table is not needed, a user might actually want to simulate (future) leap seconds that would not be in the table.

Types and functions

Otherwise the existing functions and operators in Date should still behave as presently, i.e. assume that every day has 86400 s. Obviously returned time stamps would have up to 60 in the seconds of minute field/string.

For interaction with software that cannot accept leap seconds, an iterator restamp(collectionofdatetimes, ...) would return time stamps only outside leap seconds, similar as OS calls for system time behave on many computers. It has to be an iterator, if several time stamps in the collection are in the leap second. Then their order should be preserved (they cannot all get restamped to the same value). At least on December 31 restamping should be to the same year/day, in case the times for example stamp financial transactions.

Leap second table

A table of leap seconds is not needed for this proposal.

Coding

The code for the Dates module would need to be adjusted to use UTCinstant, probably at many places. Perhaps tedious work, but no serious complications are expected.

Summary

We propose to change the internal presentation of time in Julia Base to day segmented. This allows to represent times in leap seconds. Leap second time stamps, potentially originating from automatic systems, could get accepted and returned back to a caller. It would also facilitate to build higher precision types on top of UTCinstant, using additional segmentation.

Stephan B., Swedish Institute of Space Physics

oxinabox commented 7 years ago

For reference other libraries without leap-second support.

Libraries with leap second support

In some cases these support only inserting, not deleting. But no deletions have happened yet. Also these are often capped at inserting at most one.

Also the situation is complicated in some cases for things like C where POSIX std says you must ignore the leap second, even though the language supports it.

Time is really complex. This comment is not an opinion one way or the other, just links to other languages/libraries for reference

quinnj commented 7 years ago

I haven't quite followed all the discussion with this, so I won't comment on the specifics here, but more on the process. This is certainly a situation where all this functionality could be developed in a stand-alone package outside of Base and be made to work robustly + plenty of test coverage before needing to be considered to replace the Base implementation. The beauty of what is currently in Base is the simplicity, robustness, and test coverage, being some of the most well-tested code in Base.

Anyway, carry on, just wanted to comment on the process more than proposal.

nalimilan commented 7 years ago

That's an interesting proposal, thanks for writing it in detail. However, could you discuss the existing implementations? In particular, does any of them behave like you suggest? Why did they make these design choices (among others: old vs. recent, general-purpose vs. scientific...)? Do we have evidence that this works well in practice, is it annoying in some particular cases?

Also I'm not sure I understand the consequences of this change in terms of time arithmetic. Currently the equality Day(1) == Second(86400) allows simplifying lots of operations. For example, we have:

julia> DateTime(2017, 01, 02) - DateTime(2017, 01, 01)
86400000 milliseconds

julia> step(DateTime(2017, 01, 02):DateTime(2017, 01, 01))
1 day

Do you suggest we get rid of this equality? (Note this is already what happens with months and days, so that's not necessarily a showstopper, but...)

stephancb commented 7 years ago

Not supporting leap seconds means not accepting any time stamps in leap seconds (like present Julia Dates).

Support can mean:

  1. accepting them, returning them back as is, but ignoring them for calendar time arithmetic;

  2. accepting them, returning them back as is, and defining and performing calendar time arithmetic, such that it always agrees with physically elapsed seconds (not always clear how to do this)

This proposal would do 1), similar to POSIX.

The real issues with leap seconds have been, that data got lost, ended up in the wrong time order, etc, when software didn't accept leap second time stamps, threw errors etc. But I never heard that calendar time arithmetic ignoring leap seconds caused any real problems, on the contrary, it is rather what people expect. Therefore I think that 1) is the most sensible approach.

stephancb commented 7 years ago

To write a separate package for this, I would need to build on functionality in Dates, otherwise it is a lot of work. However, Dates does not accept leap second time stamps. So catch22....

stephancb commented 7 years ago
julia> DateTime(2017, 01, 02) - DateTime(2017, 01, 01)
86400000 milliseconds

should perhaps become

julia> DateTime(2017, 01, 02) - DateTime(2017, 01, 01)
1 day, 0 usec 

to match the internal presentation (now: milliseconds, new: day, 100 us)

But leap seconds should consistently get ignored in arithmetic:

julia> DateTime(2017, 01, 01) - DateTime(2016, 12, 31)
1 day, 0 usec 

though the physically elapsed time is 86401 sec.

A point of the proposal is that a log file entry like

"2016-12-31 23:59:60: 1,000,314.00 $ from account 123456789 to account 987654321"

should not cause hickups, when it is processed using Julia Dates.

omus commented 7 years ago

Currently the equality Day(1) == Second(86400) allows simplifying lots of operations.

In the package TimeZones.jl this is not true as a day can be equal to Hour(23), Hour(24), or Hour(25). I've been trying to revise some of the Base code where Day(1) == Hour(24) is assumed to always be true. See the documentation for examples of how calendrical arithmetic works with TimeZones.

To write a separate package for this, I would need to build on functionality in Dates, otherwise it is a lot of work. However, Dates does not accept leap second time stamps. So catch22....

This is the same problem that TimeZones faced. Currently the DateTime implementation doesn't have the functionality to handle TimeZones. I ended up solving this problem by introducing a new type: ZonedDateTime. I feel like the best approach for supporting leap seconds would be to add this functionality into a new package.

Additionally, for accurate leap second support we would need to utilize the leap seconds data (ftp://ftp.iana.org/tz/tzdb-2017b/leapseconds) that IANA provides as part of tzdata.

stephancb commented 7 years ago

For this proposal a leap seconds table is not needed, because calendar arithmetic will ignore them.

The user supplies leap second time stamps, and gets them back, that's all.

The user can of course do calender arithmetic with his/her leap seconds time stamps, but the result will be the same as if the time stamp were just before the leap second. People who are interested in things on the second to subsecond level will not use the calender arithmetic functions/operators, but directly access the UTCinstant fields.

StefanKarpinski commented 7 years ago

I think the rhetoric in this Julep is a bit over the top:

Nevertheless the leap seconds seem to have been so disturbing for the Julia community, that the documentation for DateTime is now claiming not to handle UTC at all, but implementing UT, more precisely UT1.

The community is not "disturbed" by leap seconds, they're just a pain to deal with and if UT-based time is consistently used, they're not an issue. This was a design decision that was based on a lot of consideration and discussion with astronomers (among others), not some kind of panic response. Whether it was the right design decision or not is debatable but this verbiage is unnecessary and the attitude is not terribly constructive.

There are two core issues with the current scheme:

  1. Hardware systems on which Julia runs will usually be synced to UTC via NTP. When someone calls now(), even if they don't ever get a leap second timestamp (most OSes don't return them), although the value can be interpreted as UT1, that interpretation means the time is off by up to ±0.9 seconds in addition to error due to slippage of NTP syncing. Interpreting this timestamp as UTC, even without leap seconds, would be much closer to the true time – within a few milliseconds, typically – which is clearly much better. In practice, this is probably not a big deal since what we're doing is effectively the same as what many of the above software systems that don't handle leap seconds are doing, but the key difference is that we are calling it UT1, which means we're defining ourselves into telling time much worse than we would be if we just said that our timestamps are UTC without leap seconds. In the absence of the now() function, this wouldn't be a problem, but then again, without the now() function, there would be no reason to have dates and times in Base.

  2. Parsing textual timestamps from external sources. Currently, I believe that we just choke when parsing leap second timestamps because there's no way to represent them as DateTime values. That's a problem since a lot of timestamps are in UTC and many of them come from sources that do produce leap seconds. This could be handled by an external library that implements UTC, and arguably date/time parsing could be moved out of base since it's pretty complex and featurey. The biggest issue with this in my view is the likelihood of this potential exception going undiscovered in normal operation and then suddenly tripping someone up only after a system has been deployed for some time when a leap second actually occurs. That's a bad user experience.

This proposal may indeed be a good way to go. A mundane issue: this file should be named similarly to other Juleps in this repo and have an .md extension so that it renders correctly on GitHub.

c42f commented 7 years ago

It's good that this proposal can represent leap seconds and dodges the need to compute with them. Leap second tables are a real pain.

Functionality to get the real number of SI seconds between two Base date times can then be kept up to date in a package, and it's far easier to update a leap second table in that package than to somehow manage the UT1->TAI mapping.

@oxinabox - I don't think boost datetime has much in the way of leap second functionality. Leap seconds are mentioned in the documentation, but date time arithmetic ignores them (perhaps the reasoning was exactly the same as given in this julep, but it's not explained in the boost docs).