Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.92k stars 548 forks source link

POSIX::timegm missing (for POSIX::mktime symmetry) #21460

Open mirabilos opened 1 year ago

mirabilos commented 1 year ago

Module: POSIX

Description There currently is a POSIX::mktime which reverts the built-in localtime, but no POSIX::timegm to revert the built-in gmtime.

timegm has been added to ISO C23, and while the upcoming (almost finished) POSIX release (Issue 8) bases on ISO C17, chances are it’ll also have timegm, and if not then Issue 9 will definitely have it. (Presence should still be tested at configure time, of course, for older systems.)

My motivation here is to get rid of the broken code in Time::Local that assumes leap seconds do not exist, making it a thin wrapper around the two aforementioned functions from the POSIX module which do the right thing (except mktime behaves differently from Time::Local::timelocal for ambiguous times during half of a DST switch). I’ve modelled this (POSIX.xs patch, Time/Local.pm patch) on MirBSD’s old Perl 5.8, and it works save for the one ambiguous DST switch test (all others including core and maintainer pass with suitable fixes applied ((0,0,0,1,0,90) in Europe/Vienna happens to have a leap second so I went to February; leap seconds are always on Dec→Jan and Jun→Jul switches).

Steps to Reproduce Any attempt to convert the structure output by the built-in gmtime back has no suitable built-in or otherwise supplied function to do it. (Time::Local has one, but it does not match with what the OS does, both in the face of leap seconds, and it looks also very rough in the face of DST switches.)

Expected behavior The POSIX module should expose the timegm function as defined by ISO C23, an upcoming/future POSIX release, and Olson tzcode at least as shipped by the BSDs

Perl configuration This is probably not relevant here.

jkeenan commented 1 year ago

Module: POSIX

Description There currently is a POSIX::mktime which reverts the built-in localtime, but no POSIX::timegm to revert the built-in gmtime.

timegm has been added to ISO C23, and while the upcoming (almost finished) POSIX release (Issue 8) bases on ISO C17, chances are it’ll also have timegm, and if not then Issue 9 will definitely have it. (Presence should still be tested at configure time, of course, for older systems.)

A number of clarifying questions:

Thank you very much.

mauke commented 1 year ago

What do you mean by "revert[ing] the built-in gmtime?

Performing the inverse operation of gmtime. (The localtime builtin converts a unix timestamp (as returned by time) into a broken-down structure for the local time zone. POSIX::mktime does the inverse operation: It converts a broken-down date/time structure into a unix timestamp.)

Grinnz commented 1 year ago

Time::Local's timegm is rather inconsistent with other instances of such functions in similar fashion to Time::Local::timelocal's inconsistency with mktime. Aside from the whole year value interpretation conundrum (use timegm_posix in a recent version of Time::Local to avoid it), its behavior primarily differs in the case of receiving out of range values - Time::Local will throw an error, whereas mktime will interpret the out of range value as an overflow and return a result appropriately.

mirabilos commented 1 year ago

What do you mean by

This was already answered by mauke.

where people are at in these processes right now

It’s been in the C23 draft for a while and will be in the C23 release, which is in the process of being edited, and whose release is expected RSN.

It’s not yet in the (about month-old) draft of POSIX Issue 8 I have at hand, but it’s being talked about on the mailing list. There’s a chance that adding it will get deferred to Issue 9 because Issue 8 is also just shy of being released; Issue 9 will almost certainly be based on C23 as Issue 8 is already basing on C17, so it will include it. The Austin Group (those behind POSIX) have also excessively discussed wording for mktime especially regarding corner cases. (The requested timegm is the same as mktime except for UTC instead of local timezone, so it’d operate exactly the same.) So there’s definite interest. (I can do an explicit search into the issue tracker and mailing list archives if needed.)

Can you provide examples of specific, existing bugs in Time::Local?

Yes.

$ cat x.pl
use Time::Local ();
$timet = time();
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime($timet);
$rev_tl = Time::Local::timegm($sec,$min,$hour,$mday,$mon,$year);
print "Starting from ",$timet,"\n";
print $year+1900,"-",$mon+1,"-",$mday,"T",$hour,":",$min,":",$sec,"Z\n";
print "This results in ",$rev_tl,"\n";
$ perl x.pl
Starting from 1694033952
2023-9-6T20:58:45Z
This results in 1694033925

Note the 1694033952 - 1694033925 = 27 second difference owing to leap second support in the OS but not in Time::Local.

On the system where I had already added POSIX::timegm:

$ cat x.pl
use POSIX ();
$timet = time();
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime($timet);
$rev_p = POSIX::timegm($sec,$min,$hour,$mday,$mon,$year);
print "Starting from ",$timet,"\n";
print $year+1900,"-",$mon+1,"-",$mday,"T",$hour,":",$min,":",$sec,"Z\n";
print "This results in ",$rev_p,"\n";
$ perl x.pl 
Starting from 1694034016
2023-9-6T20:59:49Z
This results in 1694034016

Here, the resulting Unix timestamp is identical to the input, i.e. the function is reversible.

This cannot be fixed in Time::Local without parsing leap second information from the TZif files.

leonerd commented 1 year ago

Does https://metacpan.org/pod/Time::timegm help with this?

mirabilos commented 1 year ago

Paul Evans dixit:

Does https://metacpan.org/pod/Time::timegm help with this?

No. It says:

Epoch times + UTC always align day boundaries at multiples of 86400.

This isn’t true, days can have 86401 seconds (or, in theory, between 86398 and 86402 but the extremes are avoided and DJB says negative leap seonds are unlikely to occur and I hope he doesn’t have to eat these particular words later).

bye, //mirabilos -- (gnutls can also be used, but if you are compiling lynx for your own use, there is no reason to consider using that package) -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL

Grinnz commented 1 year ago

Paul Evans dixit: Does https://metacpan.org/pod/Time::timegm help with this? No. It says: # Epoch times + UTC always align day boundaries at multiples of 86400. This isn’t true, days can have 86401 seconds (or, in theory, between 86398 and 86402 but the extremes are avoided and DJB says negative leap seonds are unlikely to occur and I hope he doesn’t have to eat these particular words later). bye, //mirabilos -- (gnutls can also be used, but if you are compiling lynx for your own use, there is no reason to consider using that package) -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL

That's not quite correct. The Unix epoch time is defined to ignore leap seconds, so it will have 86400 seconds every day, they just may not be the same length seconds. You need the TAI timestamp or leap second aware date math to account for leap seconds.

Grinnz commented 1 year ago
$ cat x.pl
use Time::Local ();
$timet = time();
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime($timet);
$rev_tl = Time::Local::timegm($sec,$min,$hour,$mday,$mon,$year);
print "Starting from ",$timet,"\n";
print $year+1900,"-",$mon+1,"-",$mday,"T",$hour,":",$min,":",$sec,"Z\n";
print "This results in ",$rev_tl,"\n";
$ perl x.pl
Starting from 1694033952
2023-9-6T20:58:45Z
This results in 1694033925

Note the 1694033952 - 1694033925 = 27 second difference owing to leap second support in the OS but not in Time::Local.

I can't reproduce this on my system.

Starting from 1694039279
2023-9-6T22:27:59Z
This results in 1694039279

(not to say that timegm support isn't needed; just that this is not a bug in Time::Local)

mirabilos commented 1 year ago

Dan Book dixit:

The Unix epoch time is defined to ignore leap seconds

Only on some systems. This is enshrined in the POSIX standard but actually violates existing law, which mandates leap seconds to be observed, and which is relevant for timekeeping.

Some systems observe leap seconds even in “Unix time”.

I can't reproduce this on my system.

Yes, because your system doesn’t do that.

I can reproduce this…

$ TZ=right/UTC perl x.pl Starting from 1694041562 2023-9-6T23:5:38Z This results in 1694041538

… in a Debian etch chroot (which has Perl 5.8.8); I think later versions changed the builtin gmtime/localtime somehow? (In which case a port of these to MirBSD will have “extra fun”, sigh…)

bye, //mirabilos

leonerd commented 1 year ago

Oh. Ignore the docs in Time::timegm. It's just a simple obvious XS wrapper around whatever timegm(3) is provided by libc. Is this sufficient?

Grinnz commented 1 year ago

Dan Book dixit: The Unix epoch time is defined to ignore leap seconds Only on some systems. This is enshrined in the POSIX standard but actually violates existing law, which mandates leap seconds to be observed, and which is relevant for timekeeping. Some systems observe leap seconds even in “Unix time”. I can't reproduce this on my system. Yes, because your system doesn’t do that. I can reproduce this… $ TZ=right/UTC perl x.pl Starting from 1694041562 2023-9-6T23:5:38Z This results in 1694041538 … in a Debian etch chroot (which has Perl 5.8.8); I think later versions changed the builtin gmtime/localtime somehow? (In which case a port of these to MirBSD will have “extra fun”, sigh…) bye, //mirabilos

This seems quite incompatible with the Internet, which relies on the same moment being the same Unix epoch timestamp on every (synced) machine regardless of time zone. But regardless, this is ancillary to the issue of whether a timegm wrapper should be provided.

mirabilos commented 1 year ago

Paul Evans dixit:

Oh. Ignore the docs in Time::timegm. It's just a simple obvious XS wrapper around whatever timegm(3) is provided by libc. Is this sufficient?

There’s a nōn-XS fallback part, though, and something in core or close to core would be preferrable. Adding it to the POSIX module seemed obvious, as mktime is also there, and as POSIX will, in near future, mandate this libc function as well, and it was a quick fix for MirBSD (I’m unfortunately not much of a Perl developer, though I recently used more and more of it, even at $dayjob, and consider investing some learning time into it, to understand more).

My idea here being that new code could use APIs that work in more scenarios, and existing code like Time::Local which is widely used (I ran into this in CVSweb) can be changed to use the mktime/timegm APIs where available and use the old code if not (which will solve the MacOS/Win32/… problem).

Thanks, //mirabilos

mirabilos commented 1 year ago

Dan Book dixit:

This seems quite incompatible with the Internet, which relies on the same moment being the same Unix epoch timestamp on every (synced) machine regardless of time zone.

This is wrong.

And even where reliance on dates exists, it’s usually the broken-down time (struct tm, in C parlance), not the Unix timestamp, as most of the Internet is independent of Unix or even predates Unix’ networking support.

There are cases where there is a POSIX time_t without leap seconds expected on the link (the rsync protocol is one); it’s been easy enough to patch the software involved to call timet2posix/posix2timet on MirBSD for these cases.

Incidentally, syncing timestamps also doesn’t use Unix timestamps but NTP timestamps.

I’ve been running this for over 19 years now… (and yes, to my shame I never discovered that particular CVSweb issue before).

bye, //mirabilos -- „Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund, mksh auf jedem System zu installieren.“ -- XTaran auf der OpenRheinRuhr, ganz begeistert (EN: “[…]uhr.gz is a reason to install mksh on every system.”)

tonycoz commented 1 year ago

(which will solve the MacOS/Win32/… problem).

Win32 UCRT has _mkgmtime():

time_t _mkgmtime(
   struct tm* timeptr
);

which "Converts a UTC time represented by a struct tm to a UTC time represented by a time_t type."

Mac OS apparently also has timegm:

http://www.manpagez.com/man/3/timegm/osx-10.4.php

though one report indicates it's not thread safe:

https://sourceforge.net/p/aolserver/bugs/215/

Configure already probes for timegm():

$ grep -C1 timegm Porting/Glossary

d_timegm (d_timegm.U):
        This variable conditionally defines the HAS_TIMEGM symbol, which
        indicates to the C program that the timegm () routine is available.

I think it's worth adding, but callers will need to deal with cases where it isn't available.

mirabilos commented 1 year ago

Tony Cook dixit:

(which will solve the MacOS/Win32/… problem).

Win32 UCRT has _mkgmtime():

Ah, nice.

Mac OS apparently also has timegm:

http://www.manpagez.com/man/3/timegm/osx-10.4.php

That’s Mac OSX, not MacOS (which Time::Local mentions to support).

Configure already probes for timegm():

Oh, good.

I think it's worth adding, but callers will need to deal with cases where it isn't available.

That’s alright then.

Thanks, //mirabilos -- „Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund, mksh auf jedem System zu installieren.“ -- XTaran auf der OpenRheinRuhr, ganz begeistert (EN: “[…]uhr.gz is a reason to install mksh on every system.”)

tonycoz commented 1 year ago

Mac OS apparently also has timegm: http://www.manpagez.com/man/3/timegm/osx-10.4.php

That’s Mac OSX, not MacOS (which Time::Local mentions to support)

Mac OS X is dead, long live macOS

The official name, at least for new releases, is macOS.

Perl itself no longer supports pre-X macOS.