erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.43k stars 2.96k forks source link

local_to_univ_utc/1 test fails on Debian 12 #7938

Open ptome opened 11 months ago

ptome commented 11 months ago

Describe the bug When building the master branch, one test in the smoke suite is failing for me -- local_to_univ_utc in time_SUITE.erl

To Reproduce The test can be reproduced in the erl shell, from $ERL_TOP:

$ TZ=UTC bin/erl
Erlang/OTP 27 [DEVELOPMENT] [erts-14.1.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns]

Eshell V14.1.1 (press Ctrl+G to abort, type help(). for help)
1> erlang:localtime_to_universaltime({{2008, 8, 1}, {0, 0, 0}}).
{{2008,8,1},{0,0,0}}
2> erlang:localtime_to_universaltime({{2008, 8, 1}, {0, 0, 0}}, true).
{{2008,7,31},{23,0,0}}

Expected behavior The test should pass, i.e., both return values should be the same in the example above. I'm not sure if the expected result is correct or not.

Affected versions I'm using the master branch (OTP 27/DEVELOPMENT) but I suspect it would be the same for other versions. It could depend on glibc version or some other configuration, like time zone data?

Additional context I created a C test program to verify mktime behavior on my system.

On my Debian machine, when tm_isdst > 0 for TZ=UTC, it returns a valid clock (not -1). However, after the mktime call, the value for tm_isdst in the struct is changed to zero. The clock returned is offset one hour from the expected time.

On my Ubuntu VM the same program returns -1 for the clock, signalling the error.

The test program I used:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[])
{
        int check_dst;

        /* Check DST given as input */
        if (argc == 2) {
                check_dst = atoi(argv[1]);
        } else {
                printf("./check_dst <isdst value>\n");
                exit(EXIT_FAILURE);
        }

        struct tm t;
        t.tm_year = 2008 - 1900;
        t.tm_mon = 8 - 1;
        t.tm_mday = 1;
        t.tm_hour = 0;
        t.tm_min = 0;
        t.tm_sec = 0;
        t.tm_isdst = check_dst;

        time_t clock = mktime(&t);

        printf("tzname[0]: %s, tzname[1]: %s\n", tzname[0], tzname[1]);
        printf("\tdst before: %d\n\tdst after: %d\n", check_dst, t.tm_isdst);
        printf("\tclock: %ld\n", clock);

        exit(EXIT_SUCCESS);
}
$ gcc -o check_dst check_dst.c 
$ TZ=UTC ./check_dst 0
tzname[0]: UTC, tzname[1]: UTC
    dst before: 0
    dst after: 0
    clock: 1217548800
$ TZ=UTC ./check_dst 1
tzname[0]: UTC, tzname[1]: UTC
    dst before: 1
    dst after: 0
    clock: 1217545200

On my Ubuntu machine the second test check_dst 1 returns clock -1.

When the OS returns a clock, I don't know what the expected value is supposed to be. mktime is implementation dependent.

ptome commented 11 months ago

I found the reason for this behavior by debugging into glibc. In upstream glibc I didn't find differences between 2.35 and 2.36 in mktime nor timezone data.

However, the Debian patches to 2.36 have the following change, not present in Ubuntu.

File: /usr/src/glibc/debian/patches/git-updates.diff (from glib-source in Debian 12)

If these changes make it upstream, the test could fail on other Linux versions. I'm not saying this is a bug in this glibc. I'm not sure if Erlang needs to adapt to this behavior, or if I'm doing something wrong.

Ubuntu glibc seems based on Debian, so this could hit Ubuntu next. Anyone with a more recent version of Ubuntu can confirm glibc version and if this change is included.

--- a/time/mktime.c
+++ b/time/mktime.c
@@ -429,8 +429,13 @@ __mktime_internal (struct tm *tp,
         time with the right value, and use its UTC offset.

         Heuristic: probe the adjacent timestamps in both directions,
-        looking for the desired isdst.  This should work for all real
-        time zone histories in the tz database.  */
+        looking for the desired isdst.  If none is found within a
+        reasonable duration bound, assume a one-hour DST difference.
+        This should work for all real time zone histories in the tz
+        database.  */
+
+      /* +1 if we wanted standard time but got DST, -1 if the reverse.  */
+      int dst_difference = (isdst == 0) - (tm.tm_isdst == 0);

       /* Distance between probes when looking for a DST boundary.  In
         tzdata2003a, the shortest period of DST is 601200 seconds
@@ -441,12 +446,14 @@ __mktime_internal (struct tm *tp,
         periods when probing.  */
       int stride = 601200;

-      /* The longest period of DST in tzdata2003a is 536454000 seconds
-        (e.g., America/Jujuy starting 1946-10-01 01:00).  The longest
-        period of non-DST is much longer, but it makes no real sense
-        to search for more than a year of non-DST, so use the DST
-        max.  */
-      int duration_max = 536454000;
+      /* In TZDB 2021e, the longest period of DST (or of non-DST), in
+        which the DST (or adjacent DST) difference is not one hour,
+        is 457243209 seconds: e.g., America/Cambridge_Bay with leap
+        seconds, starting 1965-10-31 00:00 in a switch from
+        double-daylight time (-05) to standard time (-07), and
+        continuing to 1980-04-27 02:00 in a switch from standard time
+        (-07) to daylight time (-06).  */
+      int duration_max = 457243209;
       /* Search in both directions, so the maximum distance is half
         the duration; add the stride to avoid off-by-1 problems.  */
@@ -483,6 +490,11 @@ __mktime_internal (struct tm *tp,
              }
          }

+      /* No unusual DST offset was found nearby.  Assume one-hour DST.  */
+      t += 60 * 60 * dst_difference;
+      if (mktime_min <= t && t <= mktime_max && convert_time (convert, t, &tm))
+       goto offset_found;
+
       __set_errno (EOVERFLOW);
       return -1;
     }

By debugging my test program into glibc, I can see it scanning the timezones for a matching isdst in both directions.

Breakpoint 3, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:473
473         if (! isdst_differ (isdst, otm.tm_isdst))
(gdb) p otm
$25 = {tm_sec = 0, tm_min = 0, tm_hour = 7, tm_mday = 27, tm_mon = 10, tm_year = 108, tm_wday = 4, tm_yday = 331, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x555555559540 "UTC"}
(gdb) c
Continuing.

Breakpoint 3, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:473
473         if (! isdst_differ (isdst, otm.tm_isdst))
(gdb) p otm
$26 = {tm_sec = 0, tm_min = 0, tm_hour = 18, tm_mday = 28, tm_mon = 2, tm_year = 108, tm_wday = 5, tm_yday = 87, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x555555559540 "UTC"}
(gdb) c
Continuing.

Breakpoint 3, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:473
473         if (! isdst_differ (isdst, otm.tm_isdst))
(gdb) p otm
$27 = {tm_sec = 0, tm_min = 0, tm_hour = 6, tm_mday = 4, tm_mon = 11, tm_year = 108, tm_wday = 4, tm_yday = 338, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x555555559540 "UTC"}
(gdb) c
Continuing.

Breakpoint 3, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:473
473         if (! isdst_differ (isdst, otm.tm_isdst))
(gdb) p otm
$28 = {tm_sec = 0, tm_min = 0, tm_hour = 19, tm_mday = 21, tm_mon = 2, tm_year = 108, tm_wday = 5, tm_yday = 80, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x555555559540 "UTC"}
(gdb) 

This is the problem: after it hits the max threshold, it hits the last part and sets the "usual difference" of one hour.

Breakpoint 3, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:473
473         if (! isdst_differ (isdst, otm.tm_isdst))
(gdb) d 3
(gdb) c
Continuing.

Breakpoint 4, __mktime_internal (tp=<optimized out>, convert=0x7ffff7e93580 <__localtime_r>, offset=0x7ffff7fa9678 <localtime_offset>) at ./time/mktime.c:494
494       t += 60 * 60 * dst_difference;
(gdb) 
Continuing.
tzname[0]: UTC, tzname[1]: UTC
    dst before: 1
    dst after: 0
    clock: 1217545200
[Inferior 1 (process 2903) exited normally]
ptome commented 11 months ago

This change has made it upstream to glibc. I see it in the 2.38 version, for example.

I will report this upstream to Debian, since it could be a bug.

jhogberg commented 11 months ago

Thanks for your report, and for identifying it as a possible bug in libc. Please keep us posted :-)

ptome commented 11 months ago

The Debian maintainer is also not sure if this is a bug, and will report upstream to the glibc bugzilla.

The behavior was introduced in 2.37 and backported to Debian stable glibc 2.34, 2.35 and 2.36.

My report to Debian, for reference: #1057856

Edit: glibc bug report #31144

ptome commented 11 months ago

The glibc maintainers did not consider this a bug, see #31144.

okeuday commented 11 months ago

To avoid local time conversion problems in long-running Erlang source code it should be necessary to have Erlang/OTP use the IANA timezone database directly anyway (with https://github.com/HowardHinnant/date or similar source code). The test failure appears to be a reminder of that problem.