apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.9k stars 3.38k forks source link

[Python][CI] Failing test_dateutil_tzinfo_to_string due to new release of python-dateutil #40337

Open AlenkaF opened 4 months ago

AlenkaF commented 4 months ago

Describe the bug, including details regarding any error messages, version, and platform.

test_dateutil_tzinfo_to_string started failing with:

         tz = dateutil.tz.gettz('Europe/Paris')
>       assert pa.lib.tzinfo_to_string(tz) == 'Europe/Paris'
E       AssertionError: assert 'Europe/Monaco' == 'Europe/Paris'
E         - Europe/Paris
E         + Europe/Monaco

see: https://github.com/ursacomputing/crossbow/actions/runs/8103088008/job/22164107523#step:6:4305.

This is most probably due to new release of python-dateutil package.

The test is failing on the CI now also, with dateutil 2.9.0: https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/49322144?fullLog=true

Component(s)

Continuous Integration, Python

AlenkaF commented 4 months ago

cc @raulcd

jorisvandenbossche commented 4 months ago

I can't reproduce this on Linux, so most probably a Windows issue (or related to the exact tzdata being used there, since for me locally it's using the system provided data)

Might be good to report upstream?

jorisvandenbossche commented 4 months ago

The logic we use to convert a dateutil tz to a string:

https://github.com/apache/arrow/blob/2b194ad222f4dc8ecf2eb73539ab8cab5b1fc5e7/python/pyarrow/src/arrow/python/datetime.cc#L554-L568

(so it might also be that something changed under the hood in dateutil that messes that u. It would be good to verify if the actual tz object in the snippet above, before conversion to a string by pyarrow, also shows "Europe/Monaco")

AlenkaF commented 4 months ago

Agree with you Joris. I would first like to check the behaviour of dateutil, as you mention, but also can't do that on my M1. I could in any case open an issue upstream before, if we think that makes sense.

raulcd commented 4 months ago

The same issue is being reproduced on my PR on the conda feedstock on Windows: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=888527&view=logs&j=ab98fee7-5bc6-5c2f-a410-3ab9b2f2e8ca&t=472408ae-fc68-5791-981c-69ea41d2d692&l=38399

    def test_dateutil_tzinfo_to_string():
        pytest.importorskip("dateutil")
        import dateutil.tz

        tz = dateutil.tz.UTC
        assert pa.lib.tzinfo_to_string(tz) == 'UTC'
        tz = dateutil.tz.gettz('Europe/Paris')
>       assert pa.lib.tzinfo_to_string(tz) == 'Europe/Paris'
E       AssertionError: assert 'Europe/Monaco' == 'Europe/Paris'
E         - Europe/Paris
E         + Europe/Monaco

Edit: added on Windows

jorisvandenbossche commented 3 months ago

A temporary skip was added in https://github.com/apache/arrow/pull/40486