Open potiuk opened 10 months ago
cc: @jens-scheffler-bosch, @ashb @uranusjr @Taragolis @bolkedebruin -> this one needs someone who they know their ways around dill and serialization :). It's Python 3.11 only and we had already a weird issue that Pytest assert rewrite was breaking this tests for other Python versions so we run it as separate type with --assert=plain
and PYTEST_PLAIN_ASSERTS="true"...
Rather arcane thing, but I believe context serialization is broken currently for dill
and Python 3.11
Detected it while improving our test suite - I found out that we have not run the PlainAssert
tests for a while and well, of course it turned out that Python 3.11 has problem with them
Maybe it happen because we use dill==0.3.1.1.1 which was released at Sep 28, 2019 and Python 3.11 was released Oct 24, 2022?
root@567a6cdeef08:/opt/airflow# pipdeptree --packages dill -r
Warning!! Cyclic dependencies found:
* sphinxcontrib-applehelp => sphinx => sphinxcontrib-applehelp
* sphinxcontrib-devhelp => sphinx => sphinxcontrib-devhelp
* sphinxcontrib-htmlhelp => sphinx => sphinxcontrib-htmlhelp
* sphinxcontrib-qthelp => sphinx => sphinxcontrib-qthelp
* sphinxcontrib-serializinghtml => sphinx => sphinxcontrib-serializinghtml
* sphinx => sphinxcontrib-applehelp => sphinx
------------------------------------------------------------------------
dill==0.3.1.1
├── apache-airflow==2.8.0.dev0 [requires: dill>=0.2.2]
└── apache-beam==2.51.0 [requires: dill>=0.3.1.1,<0.3.2]
Ok, let's break dependencies, install dill==0.3.7 and have a look what is happen
warnings summary:
============================
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/decorators/validation.py:16 DeprecationWarning('Accessing jsonschema.draft4_format_checker is deprecated and will be removed in a future release. Instead, use the FORMAT_CHECKER attribute on the corresponding Validator.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:16 DeprecationWarning('jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:17 DeprecationWarning('jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.duration.Duration'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.duration.Duration'>: pendulum.duration.Duration has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.datetime.DateTime'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.datetime.DateTime'>: pendulum.datetime.DateTime has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.time.Time'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.time.Time'>: pendulum.time.Time has recursive self-references that trigger a RecursionError.')
All Warning errors can be found in the warnings.txt file.
============================================================================== short test summary info ==============================================================================
FAILED tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context - subprocess.CalledProcessError: Command '['/tmp/venvngz2drh_/bin/python', '/tmp/venvngz2drh_/script.py', '/tmp/venvngz2drh_/script.in', '/tmp/venvngz2drh_/script.out', '/tmp/venvngz2drh_/string_args.txt', '/tmp/venvngz2drh_/termination.log']' returned non-zero exit status 1.
================================================================================= 1 failed in 6.54s =================================================================================
root@6c8587e657c9:/opt/airflow#
Hmm, in the new version I could clearly see the warnings about pendulum.
Let's try pendulum 3.0.0b1, for allow run Airflow with this version I switched to https://github.com/apache/airflow/pull/34744
And with combination of pendulum==3.0.0b1
and dill=0.3.7
the test is pass
warnings summary:
============================
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/decorators/validation.py:16 DeprecationWarning('Accessing jsonschema.draft4_format_checker is deprecated and will be removed in a future release. Instead, use the FORMAT_CHECKER attribute on the corresponding Validator.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:16 DeprecationWarning('jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:17 DeprecationWarning('jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.duration.Duration'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.duration.Duration'>: pendulum.duration.Duration has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.datetime.DateTime'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.datetime.DateTime'>: pendulum.datetime.DateTime has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.time.Time'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.time.Time'>: pendulum.time.Time has recursive self-references that trigger a RecursionError.')
All Warning errors can be found in the warnings.txt file.
================================================================================= 1 passed in 6.39s =================================================================================
If return any of this packages back then test failed
Yes. It's very likely about old version and pendulum. Unfortunately beam is rather tightly pinning dill:
# Dill doesn't have forwards-compatibility guarantees within minor
# version. Pickles created with a new version of dill may not unpickle
# using older version of dill. It is best to use the same version of
# dill on client and server, therefore list of allowed versions is
# very narrow. See: https://github.com/uqfoundation/dill/issues/341.
'dill>=0.3.1.1,<0.3.2',
And yes 0.3.7 is the first one that has 3.11 support.
bad news 0.3.7 does not solve it either :(
Maybe it is a good time to think about old issue https://github.com/apache/airflow/issues/7870
Yeah. Why not :)
If you don't need Airflow context you can try to pass system_site_packages=False
to Operator as a workaround:
PythonVirtualenvOperator(
...
system_site_packages=False,
use_dill=True,
...
)
Body
There is a regression that needs investigation. One of our tests is failing only on Python 3.11
tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context
This test is part of PlainAssert suite and it has been skipped so far but after https://github.com/apache/airflow/pull/35160 the PlainAssert (having only that test) have been brought back to regular tests and it turned out that it fails for Python 3.11
The error is about dill serializing the context:
And likely connected with similar issues reported by others.
The way how to reproduce it:
1) Run
breeze --python 3.11
2) In the test remove theor PY311
in3) Run this command:
(Note - for this test assert rewriting in Pytest must be disabled that's why we have the ENV variable and --assert=plain`.
Example failure: https://github.com/apache/airflow/actions/runs/6710410150/job/18236357337 (and you can see that Python 3.8 - 3.10 are all green, only Python 3.11 is affected).
Committer