fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
210 stars 36 forks source link

Regression: `UPath.__new__` handles os.PathLike objects incorrectly #195

Closed ap-- closed 4 months ago

ap-- commented 4 months ago

current implementation calls str on an os.Pathlike instance, potentially receiving the wrong urlpath...

class A:
    def __fspath__(self):
        return "s3://bucket/abc"

    def __str__(self):
        return "A('s3://bucket/abc')"
ap-- commented 4 months ago

Note: behavior is correct for PurePath subclasses.

bolkedebruin commented 4 months ago

I'm not sure if it is related. In Airflow using ObjectStoragePath straight out with 0.2.0 will do this:

>>> from airflow.io.path import ObjectStoragePath
/Users/bolke/dev/airflow/airflow/io/path.py:47 DeprecationWarning: All _FSSpecAccessor subclasses have been deprecated.  Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
/Users/bolke/dev/airflow/airflow/io/path.py:72 DeprecationWarning: Detected a customized `__new__` method in subclass 'ObjectStoragePath'. Protocol dispatch will be disabled for this subclass. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
/Users/bolke/dev/airflow/airflow/io/path.py:72 DeprecationWarning: Detected a customized `__init__` method or `_fs` attribute in the provided `_FSSpecAccessor` subclass of 'ObjectStoragePath'. It is recommended to instead override the `UPath._fs_factory` classmethod to customize filesystem instantiation. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
>>> p = ObjectStoragePath("file:///tmp/bla")
/Users/bolke/dev/airflow/airflow/io/path.py:154 DeprecationWarning: UPath._from_parts is deprecated and should not be used. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
>>> str(p)
'file://tmp/bla'  # should be file:///tmp/bla - triple slash
ap-- commented 4 months ago

I'm not sure if it is related. In Airflow using ObjectStoragePath straight out with 0.2.0 will do this:

>>> from airflow.io.path import ObjectStoragePath
/Users/bolke/dev/airflow/airflow/io/path.py:47 DeprecationWarning: All _FSSpecAccessor subclasses have been deprecated.  Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
/Users/bolke/dev/airflow/airflow/io/path.py:72 DeprecationWarning: Detected a customized `__new__` method in subclass 'ObjectStoragePath'. Protocol dispatch will be disabled for this subclass. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
/Users/bolke/dev/airflow/airflow/io/path.py:72 DeprecationWarning: Detected a customized `__init__` method or `_fs` attribute in the provided `_FSSpecAccessor` subclass of 'ObjectStoragePath'. It is recommended to instead override the `UPath._fs_factory` classmethod to customize filesystem instantiation. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
>>> p = ObjectStoragePath("file:///tmp/bla")
/Users/bolke/dev/airflow/airflow/io/path.py:154 DeprecationWarning: UPath._from_parts is deprecated and should not be used. Please follow the universal_pathlib==0.2.0 migration guide at https://github.com/fsspec/universal_pathlib for more information.
>>> str(p)
'file://tmp/bla'  # should be file:///tmp/bla - triple slash

This is unrelated. The implementation of ObjectStoragePath on current airflow main is too tightly bound to internals of universal_pathlib==0.1.4

When I refactored the UPath implementation for python 3.12 support I was not able to provide 100% backwards-compatibility. I managed to support a large subset of custom UPath subclasses that I found on github, and even a good chunk of the ObjectStoragePath functionality, but not all of it.

So for universal_pathlib>=0.2.0 support, ObjectStoragePath needs to be refactored. Cross-referencing: apache/airflow#37524