fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
990 stars 349 forks source link

github file 404 when folder has escaped symbols #1669

Open DaveBoy opened 1 week ago

DaveBoy commented 1 week ago

A 404 error may occur when a github folder has escaped symbols, because the actual raw address is not the folder name, but partially escaped. Example below: url: https ://github.com/msojocs/fiddler-everywhere-enhance/raw/master/server/file/api.getfiddler.com/c/NUNHMjIyNkpNMDg4ZjMzMjZlLTk0OWQtNDgwNi1hMTc2LTFlZGY1MWJiOWRhOA%253D%253D/v/4.6.1/p/win/latest.yml

it's folder path: server/file/api.getfiddler.com/c/NUNHMjIyNkpNMDg4ZjMzMjZlLTk0OWQtNDgwNi1hMTc2LTFlZGY1MWJiOWRhOA%3D%3D/v/4.6.1/p/win/latest.yml

in folderpath: %3D%3D in url: %253D%253D

martindurant commented 1 week ago

@DaveBoy , could you show how you are trying to use your URL?

DaveBoy commented 1 week ago

@DaveBoy , could you show how you are trying to use your URL?

I use it like this, it can download other files from this repo, but it can't download files under special folders

user = "msojocs"
project = "fiddler-everywhere-enhance"
username = "DaveBoy"
token = "my token"
down_path = "server/file/api.getfiddler.com/c"
save_path = r"D:\temp\fiddler-everywhere-enhance"
branch = "master"
down_type = "tree"
destination = Path(save_path)
destination.mkdir(exist_ok=True, parents=True)
if os.path.exists(save_path + r"\exception.txt"):
    os.remove(save_path + r"\exception.txt")
try:
    fs = fsspec.filesystem("github", org=user, repo=project, username=username,
                           sha=branch if len(branch) != 0 else None,
                           token=token)
    if down_type == "blob":
        fs.get(down_path, (destination / down_path).as_posix())
    else:
        fs.get(down_path, destination.as_posix(), recursive=True)
except Exception as e:
    traceback.print_exc()
    info = traceback.format_exc()
    with open(save_path + r"\exception.txt", 'w') as f:
        f.write(info)
martindurant commented 1 week ago

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

DaveBoy commented 1 week ago

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

tks,i will test this plan tomorrom

ps:i found this api can get the right url,But I'm not sure it's a better plan than this.

DaveBoy commented 1 week ago

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

tks very much.I tried this solution and successfully solved my problem. My modifications are as follows,Hope to help other friends who encounter this problem (testing get and cat is normal, I am not sure whether ls needs to be modified):

Related file test address You can use the files under this address 图片

martindurant commented 1 week ago

Can you please put your changes into a PR?

DaveBoy commented 1 week ago

Can you please put your changes into a PR?

I'm very happy to do this, I'll sort out the relevant content and submit it to pr tomorrow

DaveBoy commented 1 week ago

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

get is ok, but cat is not usable via "fs.cat(r"file/test_file/test1.txt")", so I can't test it. It raises "ValueError: too many values ​​to unpack" at "for u, sh in paths". Am I using it wrong?

DaveBoy commented 1 week ago
 def cat(self, path, recursive=False, on_error="raise", **kwargs):
        paths = self.expand_path(path, recursive=recursive)
        urls = [
            self.rurl.format(org=self.org, repo=self.repo, path=u, sha=self.root)
            for u, sh in paths
        ]

@martindurant Here "ValueError: too many values ​​to unpack" is raised at "for u, sh in path". expand_path always returns a list, is there any example of its use?

martindurant commented 1 week ago

for u, sh in paths seems simply wrong, it should be for u in paths as you suspected. expand_path returns "list of all matching paths".

DaveBoy commented 1 week ago

for u, sh in paths seems simply wrong, it should be for u in paths as you suspected. expand_path returns "list of all matching paths".

@martindurant

for u, sh in paths

it's the cat method in /implementations/github.py

Thanks for the reply, I will test the quote method after the cat method fix and submit it after the test passes