apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.65k stars 3.56k forks source link

[Python] failures in `test_compute.py::test_round_temporal` in conda-forge for linux aarch64/ppc64le #43356

Open h-vetinari opened 4 months ago

h-vetinari commented 4 months ago

Describe the bug, including details regarding any error messages, version, and platform.

A while ago, all branches of the pyarrow builds in conda-forge started failing with the following kind of errors. It only happens on linux-{aarch64,ppc64le}, so may be emulation related. Some of them look potentially tzdb-related (so maaaaybe related to https://github.com/conda-forge/linux-sysroot-feedstock/pull/66), but for the differences <1h, it seems unlikely to be the cause.

=========================== short test summary info ============================
FAILED pyarrow/tests/test_compute.py::test_strftime - assert False
 +  where False = <bound method Array.equals of <pyarrow.lib.StringArray object at 0x4000f0789160>\n[\n  "10",\n  "13",\n  null\n]>(<pyarrow.lib.StringArray object at 0x4000f07895e0>\n[\n  "09",\n  "12",\n  null\n])
 +    where <bound method Array.equals of <pyarrow.lib.StringArray object at 0x4000f0789160>\n[\n  "10",\n  "13",\n  null\n]> = <pyarrow.lib.StringArray object at 0x4000f0789160>\n[\n  "10",\n  "13",\n  null\n].equals
FAILED pyarrow/tests/test_compute.py::test_extract_datetime_components - assert False
 +  where False = <bound method Array.equals of <pyarrow.lib.Int64Array object at 0x40010f63f220>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2010,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n]>(<pyarrow.lib.Int64Array object at 0x40010f63fc40>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2009,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n])
 +    where <bound method Array.equals of <pyarrow.lib.Int64Array object at 0x40010f63f220>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2010,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n]> = <pyarrow.lib.Int64Array object at 0x40010f63f220>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2010,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n].equals
 +      where <pyarrow.lib.Int64Array object at 0x40010f63f220>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2010,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n] = <function year at 0x40001c88b040>(<pyarrow.lib.TimestampArray object at 0x40010f64cdc0>\n[\n  1970-01-01 00:00:59.123456789Z,\n  2000-02-29 23:23:23.999999...5:45.000000000Z,\n  2008-12-28 00:00:00.000000000Z,\n  2008-12-29 00:00:00.000000000Z,\n  2012-01-01 01:02:03.000000000Z\n])
 +        where <function year at 0x40001c88b040> = pc.year
 +    and   <pyarrow.lib.Int64Array object at 0x40010f63fc40>\n[\n  1969,\n  2000,\n  2033,\n  2019,\n  2019,\n  2019,\n  2009,\n  2009,\n  2010,\n  2010,\n  2006,\n  2005,\n  2008,\n  2008,\n  2011\n] = <cyfunction array at 0x40001a806ee0>(1969-12-31 18:00:59.123456789-06:00    1969\n2000-02-29 17:23:23.999999999-06:00    2000\n2033-05-17 22:33:20-05:00     ...              2008\n2008-12-28 18:00:00-06:00              2008\n2011-12-31 19:02:03-06:00              2011\ndtype: int64)
 +      where <cyfunction array at 0x40001a806ee0> = pa.array
FAILED pyarrow/tests/test_compute.py::test_assume_timezone - AssertionError: assert False
 +  where False = <bound method Array.equals of <pyarrow.lib.TimestampArray object at 0x400110ca4400>\n[\n  1970-01-01 06:00:59.123456789Z...:45.000000000Z,\n  2008-12-28 05:00:00.000000000Z,\n  2008-12-29 05:00:00.000000000Z,\n  2012-01-01 06:02:03.000000000Z\n]>(<pyarrow.lib.TimestampArray object at 0x400110ca4820>\n[\n  1970-01-01 06:00:59.123456789Z,\n  2000-03-01 05:23:23.999999...5:45.000000000Z,\n  2008-12-28 06:00:00.000000000Z,\n  2008-12-29 06:00:00.000000000Z,\n  2012-01-01 07:02:03.000000000Z\n])
 +    where <bound method Array.equals of <pyarrow.lib.TimestampArray object at 0x400110ca4400>\n[\n  1970-01-01 06:00:59.123456789Z...:45.000000000Z,\n  2008-12-28 05:00:00.000000000Z,\n  2008-12-29 05:00:00.000000000Z,\n  2012-01-01 06:02:03.000000000Z\n]> = <pyarrow.lib.TimestampArray object at 0x400110ca4400>\n[\n  1970-01-01 06:00:59.123456789Z,\n  2000-03-01 05:23:23.999999...5:45.000000000Z,\n  2008-12-28 05:00:00.000000000Z,\n  2008-12-29 05:00:00.000000000Z,\n  2012-01-01 06:02:03.000000000Z\n].equals
 +    and   <pyarrow.lib.TimestampArray object at 0x400110ca4820>\n[\n  1970-01-01 06:00:59.123456789Z,\n  2000-03-01 05:23:23.999999...5:45.000000000Z,\n  2008-12-28 06:00:00.000000000Z,\n  2008-12-29 06:00:00.000000000Z,\n  2012-01-01 07:02:03.000000000Z\n] = <cyfunction array at 0x40001a806ee0>(DatetimeIndex(['1970-01-01 00:00:59.123456789-06:00',\n               '2000-02-29 23:23:23.999999999-06:00',\n          ...0',\n                         '2012-01-01 01:02:03-06:00'],\n              dtype='datetime64[ns, US/Central]', freq=None))
 +      where <cyfunction array at 0x40001a806ee0> = pa.array
FAILED pyarrow/tests/test_compute.py::test_round_temporal[nanosecond] - AssertionError: 
Arrays are not equal

Mismatched elements: 2 / 13 (15.4%)
Max absolute difference: 0 days 00:00:00.000000002
 x: array([Timestamp('1923-07-07 09:52:35.203790342+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.641559043+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.911994368+0100', tz='Europe/Brussels'),...
 y: array([Timestamp('1923-07-07 09:52:35.203790342+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.641559043+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.911994368+0100', tz='Europe/Brussels'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[microsecond] - AssertionError: 
Arrays are not equal

Mismatched elements: 2 / 13 (15.4%)
Max absolute difference: 0 days 00:00:00.000005
 x: array([Timestamp('1923-07-07 09:52:35.203791+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.641565+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.911997+0100', tz='Europe/Brussels'),...
 y: array([Timestamp('1923-07-07 09:52:35.203791+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.641565+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.911997+0100', tz='Europe/Brussels'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[millisecond] - AssertionError: 
Arrays are not equal

Mismatched elements: 2 / 13 (15.4%)
Max absolute difference: 0 days 00:00:00.002000
 x: array([Timestamp('1923-07-07 09:52:35.210000+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.643000+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.915000+0100', tz='Europe/Brussels'),...
 y: array([Timestamp('1923-07-07 09:52:35.210000+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:00.643000+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:42.915000+0100', tz='Europe/Brussels'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[second] - AssertionError: 
Arrays are not equal

Mismatched elements: 2 / 13 (15.4%)
Max absolute difference: 0 days 00:00:05
 x: array([Timestamp('1923-07-07 09:52:42+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:01+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:44+0100', tz='Europe/Brussels'),...
 y: array([Timestamp('1923-07-07 09:52:42+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:45:01+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:16:44+0100', tz='Europe/Brussels'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[minute] - AssertionError: 
Arrays are not equal

Mismatched elements: 2 / 13 (15.4%)
Max absolute difference: 0 days 00:03:00
 x: array([Timestamp('1923-07-07 09:59:00+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:47:00+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:20:00+0100', tz='Europe/Brussels'),...
 y: array([Timestamp('1923-07-07 09:59:00+0100', tz='Europe/Brussels'),
       Timestamp('1931-03-17 10:47:00+0000', tz='Europe/Brussels'),
       Timestamp('1932-06-16 02:20:00+0100', tz='Europe/Brussels'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[hour] - AssertionError: 
Arrays are not equal

Mismatched elements: 1 / 13 (7.69%)
Max absolute difference: 0 days 01:00:00
 x: array([Timestamp('1923-07-07 20:00:00-0400', tz='America/New_York'),
       Timestamp('1931-03-20 08:00:00-0500', tz='America/New_York'),
       Timestamp('1932-06-20 16:00:00-0400', tz='America/New_York'),...
 y: array([Timestamp('1923-07-07 20:00:00-0400', tz='America/New_York'),
       Timestamp('1931-03-20 08:00:00-0500', tz='America/New_York'),
       Timestamp('1932-06-20 16:00:00-0400', tz='America/New_York'),...
FAILED pyarrow/tests/test_compute.py::test_round_temporal[day] - AssertionError: 
Arrays are not equal

Mismatched elements: 1 / 13 (7.69%)
Max absolute difference: 0 days 01:00:00
 x: array([Timestamp('1923-07-17 00:00:00-0400', tz='America/New_York'),
       Timestamp('1931-03-27 00:00:00-0500', tz='America/New_York'),
       Timestamp('1932-06-19 00:00:00-0400', tz='America/New_York'),...
 y: array([Timestamp('1923-07-17 00:00:00-0400', tz='America/New_York'),
       Timestamp('1931-03-27 00:00:00-0500', tz='America/New_York'),
       Timestamp('1932-06-19 00:00:00-0400', tz='America/New_York'),...

Component(s)

Packaging, Python

jorisvandenbossche commented 4 months ago

Potentially related to https://github.com/apache/arrow/issues/42157 ? See some context about it deep in the pyodide review: https://github.com/apache/arrow/pull/37822#discussion_r1631278707

But so there we also had some failing tests at some point with timezone issues in the test about rounding timestamps