apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.26k stars 14.08k forks source link

Dill fails to serialize context in Python 3.11 correctly #35307

Open potiuk opened 10 months ago

potiuk commented 10 months ago

Body

There is a regression that needs investigation. One of our tests is failing only on Python 3.11 tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context

This test is part of PlainAssert suite and it has been skipped so far but after https://github.com/apache/airflow/pull/35160 the PlainAssert (having only that test) have been brought back to regular tests and it turned out that it fails for Python 3.11

The error is about dill serializing the context:

INFO     airflow.utils.process_utils:process_utils.py:186 Output:
INFO     airflow.utils.process_utils:process_utils.py:190 Traceback (most recent call last):
INFO     airflow.utils.process_utils:process_utils.py:190   File "/tmp/venv-call7xpd4uip/script.py", line 17, in <module>
INFO     airflow.utils.process_utils:process_utils.py:190     arg_dict = dill.load(file)
INFO     airflow.utils.process_utils:process_utils.py:190                ^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:190   File "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 270, in load
INFO     airflow.utils.process_utils:process_utils.py:190     return Unpickler(file, ignore=ignore, **kwds).load()
INFO     airflow.utils.process_utils:process_utils.py:190            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:190   File "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
INFO     airflow.utils.process_utils:process_utils.py:190     obj = StockUnpickler.load(self)
INFO     airflow.utils.process_utils:process_utils.py:190           ^^^^^^^^^^^^^^^^^^^^^^^^^
INFO     airflow.utils.process_utils:process_utils.py:190 TypeError: code() argument 13 must be str, not int
ERROR    airflow.models.taskinstance.TaskInstance:taskinstance.py:2612 Task failed with exception

And likely connected with similar issues reported by others.

The way how to reproduce it:

1) Run breeze --python 3.11 2) In the test remove the or PY311 in

    @pytest.mark.skipif(
        os.environ.get("PYTEST_PLAIN_ASSERTS") != "true" or PY311,
        reason="assertion rewriting breaks this test because dill will try to serialize "
        ...
    )
    def test_airflow_context(self):

3) Run this command:

PYTEST_PLAIN_ASSERTS="true" pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context --assert=plain

(Note - for this test assert rewriting in Pytest must be disabled that's why we have the ENV variable and --assert=plain`.

Example failure: https://github.com/apache/airflow/actions/runs/6710410150/job/18236357337 (and you can see that Python 3.8 - 3.10 are all green, only Python 3.11 is affected).

Committer

potiuk commented 10 months ago

cc: @jens-scheffler-bosch, @ashb @uranusjr @Taragolis @bolkedebruin -> this one needs someone who they know their ways around dill and serialization :). It's Python 3.11 only and we had already a weird issue that Pytest assert rewrite was breaking this tests for other Python versions so we run it as separate type with --assert=plain and PYTEST_PLAIN_ASSERTS="true"...

Rather arcane thing, but I believe context serialization is broken currently for dill and Python 3.11

potiuk commented 10 months ago

Detected it while improving our test suite - I found out that we have not run the PlainAssert tests for a while and well, of course it turned out that Python 3.11 has problem with them

Taragolis commented 10 months ago

Assumption 1

Maybe it happen because we use dill==0.3.1.1.1 which was released at Sep 28, 2019 and Python 3.11 was released Oct 24, 2022?

root@567a6cdeef08:/opt/airflow# pipdeptree --packages dill -r
Warning!! Cyclic dependencies found:
* sphinxcontrib-applehelp => sphinx => sphinxcontrib-applehelp
* sphinxcontrib-devhelp => sphinx => sphinxcontrib-devhelp
* sphinxcontrib-htmlhelp => sphinx => sphinxcontrib-htmlhelp
* sphinxcontrib-qthelp => sphinx => sphinxcontrib-qthelp
* sphinxcontrib-serializinghtml => sphinx => sphinxcontrib-serializinghtml
* sphinx => sphinxcontrib-applehelp => sphinx
------------------------------------------------------------------------
dill==0.3.1.1
├── apache-airflow==2.8.0.dev0 [requires: dill>=0.2.2]
└── apache-beam==2.51.0 [requires: dill>=0.3.1.1,<0.3.2]

Ok, let's break dependencies, install dill==0.3.7 and have a look what is happen

warnings summary:
============================
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/decorators/validation.py:16 DeprecationWarning('Accessing jsonschema.draft4_format_checker is deprecated and will be removed in a future release. Instead, use the FORMAT_CHECKER attribute on the corresponding Validator.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:16 DeprecationWarning('jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:17 DeprecationWarning('jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.duration.Duration'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.duration.Duration'>: pendulum.duration.Duration has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.datetime.DateTime'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.datetime.DateTime'>: pendulum.datetime.DateTime has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.time.Time'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.time.Time'>: pendulum.time.Time has recursive self-references that trigger a RecursionError.')

All Warning errors can be found in the warnings.txt file.
============================================================================== short test summary info ==============================================================================
FAILED tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context - subprocess.CalledProcessError: Command '['/tmp/venvngz2drh_/bin/python', '/tmp/venvngz2drh_/script.py', '/tmp/venvngz2drh_/script.in', '/tmp/venvngz2drh_/script.out', '/tmp/venvngz2drh_/string_args.txt', '/tmp/venvngz2drh_/termination.log']' returned non-zero exit status 1.
================================================================================= 1 failed in 6.54s =================================================================================
root@6c8587e657c9:/opt/airflow# 

Assumption 2

Hmm, in the new version I could clearly see the warnings about pendulum.

Let's try pendulum 3.0.0b1, for allow run Airflow with this version I switched to https://github.com/apache/airflow/pull/34744

And with combination of pendulum==3.0.0b1 and dill=0.3.7 the test is pass

warnings summary:
============================
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/decorators/validation.py:16 DeprecationWarning('Accessing jsonschema.draft4_format_checker is deprecated and will be removed in a future release. Instead, use the FORMAT_CHECKER attribute on the corresponding Validator.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:16 DeprecationWarning('jsonschema.RefResolver is deprecated as of v4.18.0, in favor of the https://github.com/python-jsonschema/referencing library, which provides more compliant referencing behavior as well as more flexible APIs for customization. A future release will remove RefResolver. Please file a feature request (on referencing) if you are missing an API for the kind of customization you need.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> connexion/json_schema.py:17 DeprecationWarning('jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.duration.Duration'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.duration.Duration'>: pendulum.duration.Duration has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.datetime.DateTime'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.datetime.DateTime'>: pendulum.datetime.DateTime has recursive self-references that trigger a RecursionError.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot locate reference to <class 'pendulum.time.Time'>.')
tests/operators/test_python.py::TestPythonVirtualenvOperator.test_airflow_context:1004
-> dill/_dill.py:412 PicklingWarning('Cannot pickle <class 'pendulum.time.Time'>: pendulum.time.Time has recursive self-references that trigger a RecursionError.')

All Warning errors can be found in the warnings.txt file.

================================================================================= 1 passed in 6.39s =================================================================================

If return any of this packages back then test failed

potiuk commented 10 months ago

Yes. It's very likely about old version and pendulum. Unfortunately beam is rather tightly pinning dill:

https://github.com/apache/beam/blob/16fee7505e35e402cc28a55384c81fef64ead254/sdks/python/setup.py#L265

      # Dill doesn't have forwards-compatibility guarantees within minor
      # version. Pickles created with a new version of dill may not unpickle
      # using older version of dill. It is best to use the same version of
      # dill on client and server, therefore list of allowed versions is
      # very narrow. See: https://github.com/uqfoundation/dill/issues/341.
      'dill>=0.3.1.1,<0.3.2',

And yes 0.3.7 is the first one that has 3.11 support.

bad news 0.3.7 does not solve it either :(

Taragolis commented 10 months ago

Maybe it is a good time to think about old issue https://github.com/apache/airflow/issues/7870

potiuk commented 10 months ago

Yeah. Why not :)

NotYuki commented 10 months ago

If you don't need Airflow context you can try to pass system_site_packages=False to Operator as a workaround:

PythonVirtualenvOperator(
  ...
  system_site_packages=False,
  use_dill=True,
  ...
)