fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
251 stars 44 forks source link

GitHubPath not working after 0.1.0 #154

Closed juftin closed 9 months ago

juftin commented 1 year ago

I've got a couple projects, browsr and textual-universal-directorytree, that leverage universal-pathlib for a TUI file browser. I started working on updating dependencies yesterday and noticed an incompatibility with the new version of universal-pathlib: my custom implementation of UPath for fsspec's GithubFileSystem stopped working after the release of 0.1.0. In particular, the method is_dir() is what broke on instances of the GitHubPath:

The following code, I'm calling it example.py, will correctly know that upath is a directory prior to 0.1.0 but will get this wrong after:

import upath
from upath import UPath, registry
from upath.core import _FSSpecAccessor

class GitHubPath(UPath):
    """
    GitHubPath supporting the fsspec.GitHubFileSystem
    """

    _default_accessor = _FSSpecAccessor

    @property
    def path(self) -> str:
        """
        Paths get their leading slash stripped
        """
        return super().path.strip("/")

    @property
    def name(self) -> str:
        """
        Override the name for top level repo
        """
        if self.path == "":
            org = self._accessor._fs.org
            repo = self._accessor._fs.repo
            sha = self._accessor._fs.storage_options["sha"]
            github_name = f"{org}:{repo}@{sha}"
            return github_name
        else:
            return super().name

if upath.__version__.startswith("0.1"):
    registry.register_implementation("github", GitHubPath)
else:
    registry._registry.known_implementations[
        "github"
    ] = "example.GitHubPath"

if __name__ == "__main__":
    print(upath.__version__)

    github = upath.UPath("github://fsspec:universal_pathlib@main")
    contents = {item.name: item for item in github.iterdir()}
    source_code = contents["upath"]

    print(source_code)
    print(source_code.is_dir())

0.0.23 Output:

0.0.23
github://fsspec:universal_pathlib@main/upath
True

0.1.3 Output:

0.1.3
github://fsspec:universal_pathlib@main/upath
False

I know that GitHub isn't a supported upath implementation yet so I understand if you're unable to support this issue, I'd also like to get it added if that's something you're interested in. I'm happy to share code or submit a PR where I can - right now I'm just stuck on trying to figure out exactly what changed.

juftin commented 1 year ago

Looking at this closer I think I've got the issue figured out, there were leading slashes being passed down to the GithubFilesystem that it didn't like. The following code looks to have resolved the issue without the need to customize the GitHubPath further:

import upath
from upath import UPath, registry
from upath.core import _FSSpecAccessor

class _GitHubAccessor(_FSSpecAccessor):
    """
    FSSpec Accessor for GitHub
    """

    def _format_path(self, path: UPath) -> str:
        """
        Remove the leading slash from the path
        """
        return path._path.strip("/")

class GitHubPath(UPath):
    """
    GitHubPath supporting the fsspec.GitHubFileSystem
    """

    _default_accessor = _GitHubAccessor

With that being said, are you interested in a GitHub implementation for upath and are there any issues you see with the example in this comment? I'm happy to contribute if so. Thanks again for the fantastic library

ap-- commented 1 year ago

Hi @juftin

Thanks for reporting the issue! We're keeping up with the changes in pathlib, and will probably reach a stable interface with python 3.13 that can then be back-ported. In the meantime we're trying the best to not break external UPath subclasses, but can't fully guarantee.

That being said, the best way to ensure your custom implementation will keep on working is to add it to universal_pathlib 😃 So yes, it would be great if you could make a PR.

I wrote some instructions here on how to add new implementations: https://github.com/fsspec/universal_pathlib/issues/117#issuecomment-1641532209

In addition to your suggested changes above, you would need to add a test that derives from the BaseTests class and does the setup. I would probably try to use the universal_pathlib repo for the tests.

Let me know if you need any further advice. If you get stuck, feel free to open a draft PR for discussion.

Cheers, Andreas

ap-- commented 9 months ago

Closed via 8314a65746c4d62c9263d7d98007a8b54f971dc5