Closed ap-- closed 9 months ago
Are there any ways we can help with getting that one in @ap-- ? We are using universal_pathlib in Airflow and this seems to be currently the last blocker for us to add Python 3.12 - unfortunately it's used in "core" feature of Airflow and it will be difficult to exclude just that feature (we are using universal_pathlib for fsspec integration) so we are quite blocked from 3.12 support.
But maybe we can help somehow in getting it out faster?
Thank you so much for offering to help!
I am working on getting out a pre-release with 3.12 support soon. It would be great if you could help testing that release before we roll out a new version.
Absolutely. I am subscribed to the issu here, so just comment it here and I will test it. we have a number of test failing now and Airflow has generally quite comprehensive set of tests, so happy to pass it through our machinery.
Seeing the green PR - I took it for a spin in Airflow and installed it using github URL from this branch and I can confirm all failures in the pathlib related tests are gone!
We still have a few other tests failing (but they are mostly internal/test related but the pathlib ones are working fine https://github.com/apache/airflow/actions/runs/7680234838/job/20932304380?pr=36755
@potiuk thanks for testing this branch. Tbh I am surprised that the tests related to UPath
pass.
I looked at the custom ObjectStoragePath
subclass in airflow (https://github.com/apache/airflow/blob/6f41010269e4a2c052b564f8f0f977abbaf5dfc7/airflow/io/path.py#L71), and the way __new__
is implemented, instantiating ObjectStoragePath
should crash on python 3.12 because the _from_parts
classmethod does not exist anymore.
Is it possible that the test suite is not running on python 3.12 ?
Yeah. It was my mistake, I simply missed it in the report because we have some other issue that triggers "max recursion" error (connected to utcnow() raising a deprecation warning in Python 3.12) so it clouded the output enough for me to miss the pathlib ones. Luckily the very new feature (a week or so) added in GitHub Actions that allows to quickly search even in long output of CI jobs is a life-saver and i started using it :).
Do you have some suggestions on how we can address it @ap-- - since you looked at the code ?
It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).
Any words of advise here?
For a bit of context: @bolkedebruin and @uranusjr - I am working on Python 3.12 compatibility in https://github.com/apache/airflow/pull/36755 and after current PR in universal_pathlib got green - brining universal_pathlib (hopefully) to Python 3.12 compatibility, I added installation of universal_pathlib from this git branch and it would be great if we figure out how to convert our IO implementation to also be compatbile.
Currently the tests of ours fail with (and the reason is that _from_parts
is removed from both Pathlib (I think) and UniversalPathlib (I am sure):
https://github.com/apache/airflow/actions/runs/7688925657/job/20950993493?pr=36755#step:6:9018
tests/always/test_example_dags.py:87: AssertionError
----------------------------- Captured stderr call -----------------------------
INFO [airflow.models.dagbag.DagBag] Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
ERROR [airflow.models.dagbag.DagBag] Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 994, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/airflow/airflow/io/path.py", line 153, in __new__
return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs) # type: ignore
^^^^^^^^^^^^^^^
AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?
------------------------------ Captured log call -------------------------------
INFO airflow.models.dagbag.DagBag:dagbag.py:538 Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
ERROR airflow.models.dagbag.DagBag:dagbag.py:348 Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 994, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/airflow/airflow/io/path.py", line 153, in __new__
return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs) # type: ignore
^^^^^^^^^^^^^^^
AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?
Any ideas for good approach we can take there?
It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).
Any words of advise here?
I surveyed custom user subclasses on GitHub and collected a list of features in #172 that would need to land in UPath to allow simpler customization of UPath's behavior. I also opened a new issue in the airflow repository apache/airflow#37067 to track the progress of what's needed to make this change as smooth as possible.
It would be best if I could have some more insight into the intention behind the custom UPath implementation and see if I would cover all cases with #172.
Just for reference @potiuk : I was aware of the development direction of Pathlib but it was a bit of a moving target at the time. Hence deciding to use the 'current' and wait for something more stable to arrive, assuming that the user facing api of pathlib will mostly stay the same.
For everyone following this PR, a new release will be out once #172 is fixed. Should be within the next days.
Cool
This is going to be a bit difficult. But it will close #137.
some background
We avoided refactoring UPath for 3.11 by vendoring 3.10 pathlib code as much as possible. Now things have changed again (3.12) (for the better!) and will change again (3.13) (to waaay better).
The roadmap is going to be, that ~at some point in time~ Python 3.13's
pathlib.PathBase
is available in an early version as a backport for older pythons in the ~pathlib2
library~pathlib-abc
library. At that point, we can just base our implementation of UPath on the latest version of PathBase, and support older pythons via the backport.Until then I introduce a Python3.12 compatible implementation in upath that will be used when you're on 3.12 and otherwise we fallback to the current implementation.
current state
This s a wip PR for now, and I still need to fix:
touch()
is incorrect)memory://
uri's need revisitingPosixUPath
andWindowsUPath
compatibility testsPosixUPath
subclassing testsLocalPath
on windowstodo