Dual-Life / Time-Piece

Object Oriented time objects
Other
15 stars 33 forks source link

Inconsistency in strftime method #23

Open barefootcoder opened 8 years ago

barefootcoder commented 8 years ago

I'm not 100% sure, but I think this is a bug. Demonstrating it is easy, if you happen to have a sufficiently Linux-like OS:

buddy@cibola ~ $ perl -le 'print scalar gmtime 796694400'
Sat Apr  1 00:00:00 1995
buddy@cibola ~ $ perl -MPOSIX -le 'print strftime("%m", gmtime 796694400)'
04
buddy@cibola ~ $ perl -MTime::Piece -le 'print ((gmtime 796694400)->strftime("%m"))'
04
buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -le 'print scalar gmtime 796694400'
Sat Apr  1 00:00:00 1995
buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MPOSIX -le 'print strftime("%m", gmtime 796694400)'
04
buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MTime::Piece -le 'print ((gmtime 796694400)->strftime("%m"))'
03

(You might have to substitute something else for /etc/timezone if your OS is less Linux-like. I wouldn't even begin to know how to reproduce it on Windows.)

Now, this might not be a bug at all ... if I twiddle my eyes just right, I can see how any leap seconds added prior to 4/1/1995 (of which there were 19, according to Wikipedia) might mean that 796694400 is actually a few seconds before midnight on the date in question. But that doesn't explain the inconsistency with gmtime in scalar context, or with POSIX::strftime. So I'm pretty sure it's a bug, although possibly not a bug with Time::Piece.

But do note that only $t->strftime gives the surprising result:

buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MTime::Piece -le 'print ((gmtime 796694400)->mon)'
4

So I think the problem is probably with Time::Piece, since it seems to be internally inconsistent.

Or is there something deeper that I'm not understanding?

smith153 commented 8 years ago

I'm assuming the error you are pointing out is: buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MTime::Piece -le 'print ((gmtime 796694400)->strftime("%m"))' 03

Or basically setting a timezone even though using a gmt time still causes the wrong month?

barefootcoder commented 8 years ago

I'm assuming the error you are pointing out is:

Yes, exactly: Time::Piece->strftime("%m") agrees with neither Time::Piece->mon nor POSIX::strftime("%m").

Or basically setting a timezone even though using a gmt time still causes the wrong month?

Well, not just setting any timezone. It's only the right/ timezones (i.e. those that include leap seconds). Every right/ timezone will do it, and no non-right/ timezone will. So it's obviously something peculiar to the leap seconds. What, exactly, I have no idea ...

mlawren commented 8 years ago

If I squint my eyes the same as you I would actually expect to see different dates for these two

# Here we are saying that 796694400 is UNIX time (UTC without leap seconds)
$ perl -le 'print scalar gmtime 796694400'
Sat Apr  1 00:00:00 1995

# Here we are saying that 796694400 is TAI time (UTC plus leap seconds) which has *not yet reached Apr 1* (in your timezone)
$ TZ=right/$(cat /etc/timezone) perl -le 'print scalar gmtime 796694400'
Sat Apr  1 00:00:00 1995

Compare the above with the date command (dates are different because I'm in Europe/Zurich):

$ date --date='@796694400'
Saturday 1 April  02:00:00 CEST 1995

$ TZ=right/$(cat /etc/timezone) date --date='@796694400'
Saturday 1 April  01:59:41 CEST 1995

So to my mind the only correct call from your list of "right" examples is the Time::Piece->strftime("%m") version which uses the underlying C strftime function - everything else depends on Perl's idea of time. Apparently even POSIX::strftime.

So I think the bug actually comes from Perl's core time functions not dealing with the TAI ("right") timezones properly.

barefootcoder commented 8 years ago

If I squint my eyes the same as you I would actually expect to see different dates for these two ...

Right, that's exactly what I was thinking. But I think something must be wrong somewhere if

$tp->strftime("%m") != $tp->mon

Maybe the bug isn't with Time::Piece at all. Maybe it's with POSIX::strftime ... I'm assuming Time::Piece::strftime calls POSIX::strftime at some level, though I couldn't find it in the code (XS, maybe?). If we can say definitively that that's the case, then I could close this bug and open one for POSIX.

Except ...

The POSIX module has to follow the POSIX standard, even if the standard is wrong. And, from my (limited) reading on the topic, POSIX is wrong when it comes to leap seconds. And, anyway, I'm pretty sure that POSIX::strftime is just going to call the underlying strftime in the C library, and it's not the POSIX module's fault if that's not doing the right thing.

So if we do definitely decide the bug is in POSIX::strftime, I'm not actually sure what a reasonable course of action would be. :-/

mlawren commented 8 years ago

At least one of the assumptions in my first comment is wrong. According to the documentation gmtime is always UNIX time (i.e. has nothing to do with the difference between TAI/UTC). Therefore my tests with gmtime against the date command are invalid. However tests with localtime appear to be correct:

TZ=right/$(cat /etc/timezone) perl -le "warn scalar localtime 796694400";
Sat Apr  1 01:59:41 1995 at -e line 1.

TZ=$(cat /etc/timezone) perl -le "warn scalar localtime 796694400"
Sat Apr  1 02:00:00 1995 at -e line 1.

So now that I think I know what I know better we could test everything a bit more clearly. I have adjusted the epoch value below for my time zone in order to cross the day boundary.

for tz in $(cat /etc/timezone) right/$(cat /etc/timezone); do
    echo "TZ=$tz"
    TZ=$tz perl -lE \
        "say 'Perl gmtime: '.scalar gmtime 796694400-7200"
    TZ=$tz perl -lE \
        "say 'Perl localtime: '.scalar localtime 796694400-7200"
    TZ=$tz perl -lE \
        "say 'Time::Piece localtime: '.scalar localtime(796694400-7200)"
    TZ=$tz perl -MTime::Piece -lE \
        "say 'Time::Piece localtime->mon: '.scalar localtime(796694400-7200)->mon"
    TZ=$tz perl -MTime::Piece -lE \
        "say 'Time::Piece localtime->strftime: '.scalar localtime(796694400-7200)->strftime('%m')"
    TZ=$tz perl -MPOSIX=strftime -lE \
        "say 'POSIX strftime(gmtime): '. strftime('%m',gmtime(796694400-7200))"
    TZ=$tz perl -MPOSIX=strftime -lE \
        "say 'POSIX strftime(localtime): '. strftime('%m',localtime(796694400-7200))"

done

The above looks like this here:

TZ=Europe/Zurich
Perl gmtime: Fri Mar 31 22:00:00 1995
Perl localtime: Sat Apr  1 00:00:00 1995
Time::Piece localtime: Sat Apr  1 00:00:00 1995
Time::Piece localtime->mon: 4
Time::Piece localtime->strftime: 04
POSIX strftime(gmtime): 03
POSIX strftime(localtime): 04

TZ=right/Europe/Zurich
Perl gmtime: Fri Mar 31 22:00:00 1995
Perl localtime: Fri Mar 31 23:59:41 1995
Time::Piece localtime: Fri Mar 31 23:59:41 1995
Time::Piece localtime->mon: 3
Time::Piece localtime->strftime: 03
POSIX strftime(gmtime): 03
POSIX strftime(localtime): 03

So I went back and looked at the documentation for POSIX::strftime and it mentions nothing about being locale-dependent. Which tells me that if you want to compare Time::Piece->localtime->mon or Time::Piece->localtime->strftime('%m') against the POSIX module you have to also pass localtime values to the POSIX strftime function.

So I don't see a bug as such. I see some confusing similarly-named functions that expect different inputs.

smith153 commented 8 years ago

What catches my eye is:

buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MPOSIX -le 'print strftime("%m", gmtime 796694400)'
04
buddy@cibola ~ $ TZ=right/$(cat /etc/timezone) perl -MTime::Piece -le 'print ((gmtime 796694400)->strftime("%m"))'
03

Time::Piece::strftime and POSIX::strftime both call the glibc/libc strftime: http://linux.die.net/man/3/strftime and so should always be the same. The problem is that native strftime expects the tm struct which is different on every platform. For Time::Piece, I changed the call to XS to pass the epoch value which in turn is passed to the native libc gmtime or localtime which guarantees to return the correct tm struct for the platform which can be passed straight into libc strftime. I do not think POSIX::strftime has the luxury of calling the native gmtime (internal perl gmtime returns its own version of the tm struct) and it therefore tries to guess the best values for the native tm struct before passing off to libc strftime.

Hard to say which one would be more "right". For the most part, nearly all function calls made by Time::Piece::strftime are native and I would assume that if there are any issues it is platform related. But this also means that the individual Time::Piece elements ($tp->mon, $tp->min, etc) may not line up exactly with native strftime.

I'll try and figure out why Time::Piece::strftime and POSIX::strftime are not lining up in this test case as a starting point.

mlawren commented 8 years ago

I'll try and figure out why Time::Piece::strftime and POSIX::strftime are not lining up in this test case as a starting point.

Here is perhaps a simpler test case to use as a starting point. It does not depend on the 1995 month switch that Buddy first identified and is pure Time::Piece.

use strict;
use warnings;
use Test::More;
use Time::Piece;

sub compare {
    my $gmtime = gmtime;
    is $gmtime->hms, $gmtime->strftime('%T'),
        $ENV{TZ} . ' hms:'
      . $gmtime->hms
      . ' eq strftime:'
      . $gmtime->strftime('%T');
}

$ENV{TZ} = 'UTC';
compare;

$ENV{TZ} = 'right/UTC';
compare;

done_testing();

On my Debian host this outputs the following:

ok 1 - UTC hms:09:16:54 eq strftime:09:16:54
not ok 2 - right/UTC hms:09:16:54 eq strftime:09:16:28
#   Failed test 'right/UTC hms:09:16:54 eq strftime:09:16:28'
#   at j line 9.
#          got: '09:16:54'
#     expected: '09:16:28'
1..2
# Looks like you failed 1 test of 2.

The question is under the "right/UTC" timezone which value is correct?

As I understand perl's gmtime it is always UNIX time (i.e. no leap seconds) in which case it appears that Time::Piece->strftime is applying a timezone offset inappropriately.

Mark Lawrence

mlawren commented 8 years ago

Here is another question: is it even valid to be using "right/" timezones on a machine whose hardware clock is not set to TAI?

barefootcoder commented 8 years ago

Here is another question: is it even valid to be using "right/" timezones on a machine whose hardware clock is not set to TAI?

Oh, I have no idea about that. Basically all I was trying to do when I found it was to run a test in every possible timezone to verify that my code would work no matter where (or when, I suppose) it was being run. So I was just rifling through all my timezone files, and roughly half of those are under right/.

What I find most interesting in your example up above is that, when you use localtime instead of gmtime, it all works. So apparently there's something about UTC that Time::Piece::strftime and POSIX::strftime disagree on ...

But I'm still confused about where the bug is. I'm still pretty sure there is one, because things should agree (as @smith153 says up above). They don't, and that seems problematic. But one of them seems to be saying, "the selected timezone has leap seconds, so we have to honor that, therefore March" and the other one seems to be saying "ah, but UTC is defined as no leap seconds, so therefore April." But which one is right? No clue.

barefootcoder commented 5 years ago

Any further thoughts on this? I really think this:

... one of them seems to be saying, "the selected timezone has leap seconds, so we have to honor that, therefore March" and the other one seems to be saying "ah, but UTC is defined as no leap seconds, so therefore April." But which one is right?

is the crux of it.

smith153 commented 5 years ago

This will be mostly a brain dump but perhaps I can better articulate it at another time...

The issue is we don't know what is right from the following:

use strict;
use warnings;
use Test::More;
use Time::Piece;

sub compare {
    my $gmtime = gmtime;
    is $gmtime->hms, $gmtime->strftime('%T'),
        $ENV{TZ} . ' hms:'
      . $gmtime->hms
      . ' eq strftime:'
      . $gmtime->strftime('%T');
}

$ENV{TZ} = 'UTC';
compare;

$ENV{TZ} = 'right/UTC';
compare;

done_testing();

The next issue is that Time::Piece is somewhat deceiving. You'd think that as an object, when you call methods on it, those methods just return data based on it's internal state. That would be nice, but is not always the case... At it's core, TP contains an array of a simplified version of the libc tm struct which consists of the year, dom, month, etc (aka "the broken down time"). In some cases, calling methods on TP simply return this data (like ->sec), in other cases some methods return data after doing a bunch of calculations ( like ->tzoffset) and yet in other cases method calls return data from outside TP ( such as strptime and strftime).

Now onto where the data from $gmtime->hms and $gmtime->strftime('%T') comes from.

Calling gmtime (with TP loaded) fetches a broken down time via CORE::gmtime. Digging deeper into perl, Core gmtime calls Perl_gmtime64_r. If it is called with no args, an epoch is provided via a call to the native libc time().

But core gmtime returns a broken down time struct and not an epoch... so Perl_gmtime64_r starts at 1970, and gets busy doing some math to see how many years, days, months, its been between 1970 and the current provided epoch. There is apparently no call to the native libc gmtime. Apparently at one time there was but that changed between 5.8 and 5.10. Now the home grown Perl_gmtime64_r is used in all cases. There are some configurable macros USE_SYSTEM_GMTIME and USE_SYSTEM_LOCALTIME but as far as I can tell when compiling the perl source, the macros are never set... perhaps they used to be.

So setting $ENV{TZ} and then calling gmtime actually doesn't do anything as the built in perl gmtime does not look at that.

And what about $gmtime->strftime? First strftime is translated into an XS call to _strftime. _strftime takes an epoch and calls the native system localtime or gmtime with that epoch. These native calls return the correct tm struct that the native libc strftime requires (it differs on each platform hence just easier to use an epoch).

So what does the native libc gmtime do? Well that's a good question. Apparently it gets translated to a call to __tz_convert which looks scary and starts out with a variable named leap_correction.

So TP is built by a call to a home grown gmtime whereas data from strftime comes from an actual libc library call. I think at one time I looked into whether one can call core perl functions from an XS module, but I think the gist of that idea was "perl is an executable, not a library"... but I could be wrong (and would actually like clarification).

But what about the case that using localtime just seems to "work better" or lead to less surprises? Well, even though there are macros for USE_SYSTEM_LOCALTIME scattered throughout the perl source (that are subsequently never defined or used), if you follow the function calls perl's localtime call is translated to Perl_localtime64_r. That checks for SHOULD_USE_SYSTEM_LOCALTIME which is apparently always false. Eventually it calls a macro version of LOCALTIME_R which is actually a call to S_localtime_r which is defined as then calling the native libc localtime anyway (lol)

Hence why setting ENV{TZ} actually works is because the libc time functions actually look for this in the current environment (but not perl so much).

So ultimately I think strftime is right. In my own code, I mostly just use TP to call strftime since it mostly just uses native libc calls (that I trust more).

And also to allude to the fact that leap seconds do affect UTC is the following using the normal date command:

user@t61:/home/CR/perl5$ TZ=UTC  date --date='@796694400'                                                                                                                                                                                  
Sat Apr  1 00:00:00 UTC 1995                                                                                                                                                                                                                 
user@t61:/home/CR/perl5$ TZ=right/UTC  date --date='@796694400'                                                                                                                                                                            
Fri Mar 31 23:59:41 UTC 1995                                                                                                                                                                                                                 
user@t61:/home/CR/perl5$ 

So the question is where to from here? And to that, I'm not really sure. Will require more thought :)

*Note my understanding of how all this works could just be very wrong.....

smith153 commented 5 years ago

And for the next phase of this analysis...

I'm trying to find docs on what exactly the "right" timezones mean. Not having much luck. The readme for the linux tzdata package states:

Two different versions are provided:
- The "posix" version is based on the Coordinated Universal Time (UTC).
- The "right" version is based on the International Atomic Time (TAI),
  and it includes the leap seconds.

Which is weird as TAI does not have leap seconds. Leap seconds are applied to UTC so it stays within one second of UT1, correct? And we know UTC is behind TAI.

Using zdump -v to view the files /usr/share/zoneinfo/EST5EDT and /usr/share/zoneinfo/right/EST5EDT shows EST5EDT to contain only info about DST change times, yet right/EST5EDT shows not only the DST change times, but also the leap second change times. So the "right" version does include the leap seconds. But what does that mean? Leap seconds are already included in UTC, hence why it is behind TAI.

I had high hopes for this article, and though long, it did not provide much info. But the phases "If the system clock is kept in TAI and a right/* timezone is used ..." is a repeated a couple of times.

Are we perhaps reading this wrong? It seems POSIX (along with Unix time/epoch) have no support for leap seconds, it is up to something external to adjust the hardware clock to account for the second. And the unix epoch is defined as the number of seconds since 1970 minus the leap seconds (since leap seconds are not added to the epoch). Is it then perhaps that the right/ timezones are for use only when your system clock is set to TAI? That would explain why using them causes the time to be offset into the past instead of the future...

smith153 commented 5 years ago

@barefootcoder Any thoughts on any of this? Hoping your silence is due to contemplation and not because I'm way off base :smile: