fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
211 stars 36 forks source link

Support Python 3.12 #152

Closed ap-- closed 5 months ago

ap-- commented 9 months ago

This is going to be a bit difficult. But it will close #137.

some background

We avoided refactoring UPath for 3.11 by vendoring 3.10 pathlib code as much as possible. Now things have changed again (3.12) (for the better!) and will change again (3.13) (to waaay better).

The roadmap is going to be, that ~at some point in time~ Python 3.13's pathlib.PathBase is available in an early version as a backport for older pythons in the ~pathlib2 library~ pathlib-abc library. At that point, we can just base our implementation of UPath on the latest version of PathBase, and support older pythons via the backport.

Until then I introduce a Python3.12 compatible implementation in upath that will be used when you're on 3.12 and otherwise we fallback to the current implementation.

current state

This s a wip PR for now, and I still need to fix:

todo

potiuk commented 5 months ago

Are there any ways we can help with getting that one in @ap-- ? We are using universal_pathlib in Airflow and this seems to be currently the last blocker for us to add Python 3.12 - unfortunately it's used in "core" feature of Airflow and it will be difficult to exclude just that feature (we are using universal_pathlib for fsspec integration) so we are quite blocked from 3.12 support.

But maybe we can help somehow in getting it out faster?

ap-- commented 5 months ago

Thank you so much for offering to help!

I am working on getting out a pre-release with 3.12 support soon. It would be great if you could help testing that release before we roll out a new version.

potiuk commented 5 months ago

Absolutely. I am subscribed to the issu here, so just comment it here and I will test it. we have a number of test failing now and Airflow has generally quite comprehensive set of tests, so happy to pass it through our machinery.

potiuk commented 5 months ago

Seeing the green PR - I took it for a spin in Airflow and installed it using github URL from this branch and I can confirm all failures in the pathlib related tests are gone!

potiuk commented 5 months ago

We still have a few other tests failing (but they are mostly internal/test related but the pathlib ones are working fine https://github.com/apache/airflow/actions/runs/7680234838/job/20932304380?pr=36755

ap-- commented 5 months ago

@potiuk thanks for testing this branch. Tbh I am surprised that the tests related to UPath pass.

I looked at the custom ObjectStoragePath subclass in airflow (https://github.com/apache/airflow/blob/6f41010269e4a2c052b564f8f0f977abbaf5dfc7/airflow/io/path.py#L71), and the way __new__ is implemented, instantiating ObjectStoragePath should crash on python 3.12 because the _from_parts classmethod does not exist anymore.

Is it possible that the test suite is not running on python 3.12 ?

potiuk commented 5 months ago

Yeah. It was my mistake, I simply missed it in the report because we have some other issue that triggers "max recursion" error (connected to utcnow() raising a deprecation warning in Python 3.12) so it clouded the output enough for me to miss the pathlib ones. Luckily the very new feature (a week or so) added in GitHub Actions that allows to quickly search even in long output of CI jobs is a life-saver and i started using it :).

Do you have some suggestions on how we can address it @ap-- - since you looked at the code ?

It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).

Any words of advise here?

For a bit of context: @bolkedebruin and @uranusjr - I am working on Python 3.12 compatibility in https://github.com/apache/airflow/pull/36755 and after current PR in universal_pathlib got green - brining universal_pathlib (hopefully) to Python 3.12 compatibility, I added installation of universal_pathlib from this git branch and it would be great if we figure out how to convert our IO implementation to also be compatbile.

Currently the tests of ours fail with (and the reason is that _from_parts is removed from both Pathlib (I think) and UniversalPathlib (I am sure):

https://github.com/apache/airflow/actions/runs/7688925657/job/20950993493?pr=36755#step:6:9018

 tests/always/test_example_dags.py:87: AssertionError
  ----------------------------- Captured stderr call -----------------------------
  INFO  [airflow.models.dagbag.DagBag] Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  ERROR [airflow.models.dagbag.DagBag] Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  Traceback (most recent call last):
    File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
      loader.exec_module(new_module)
    File "<frozen importlib._bootstrap_external>", line 994, in exec_module
    File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
    File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
      TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/airflow/airflow/io/path.py", line 153, in __new__
      return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^
  AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?
  ------------------------------ Captured log call -------------------------------
  INFO     airflow.models.dagbag.DagBag:dagbag.py:538 Filling up the DagBag from /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  ERROR    airflow.models.dagbag.DagBag:dagbag.py:348 Failed to import: /opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py
  Traceback (most recent call last):
    File "/opt/airflow/airflow/models/dagbag.py", line 344, in parse
      loader.exec_module(new_module)
    File "<frozen importlib._bootstrap_external>", line 994, in exec_module
    File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
    File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
      TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/airflow/airflow/io/path.py", line 153, in __new__
      return cls._from_parts(args_list, url=parsed_url, conn_id=conn_id, **kwargs)  # type: ignore
             ^^^^^^^^^^^^^^^
  AttributeError: type object 'ObjectStoragePath' has no attribute '_from_parts'. Did you mean: '_load_parts'?

Any ideas for good approach we can take there?

ap-- commented 5 months ago

It's been @bolkedebruin and @uranusjr who implemented and reviewed it originally and I would have to look deeper on how it is implemented - but since the heavy changes to Pathlib implementation in Python 3.12 (making it ACTUALLY extendible) probably our approach should also be to have a different approach (also knowing that Python 3.13 is also bringing changes there).

Any words of advise here?

I surveyed custom user subclasses on GitHub and collected a list of features in #172 that would need to land in UPath to allow simpler customization of UPath's behavior. I also opened a new issue in the airflow repository apache/airflow#37067 to track the progress of what's needed to make this change as smooth as possible.

It would be best if I could have some more insight into the intention behind the custom UPath implementation and see if I would cover all cases with #172.

bolkedebruin commented 5 months ago

Just for reference @potiuk : I was aware of the development direction of Pathlib but it was a bit of a moving target at the time. Hence deciding to use the 'current' and wait for something more stable to arrive, assuming that the user facing api of pathlib will mostly stay the same.

ap-- commented 5 months ago

For everyone following this PR, a new release will be out once #172 is fixed. Should be within the next days.

potiuk commented 5 months ago

Cool