PyFilesystem / pyfilesystem2

Python's Filesystem abstraction layer
https://www.pyfilesystem.org
MIT License
1.96k stars 173 forks source link

Can open_fs work with http protocol? #505

Open mezhaka opened 2 years ago

mezhaka commented 2 years ago

Is there such a thing as fs.open_fs("https://something.com/whatever")? Being pyfilesystem user for some time (I use it extensively for "gs://", "tar://", "temp://") I assumed it would work out of the box for "http://" and now I discover it does not.

My use case is my own open_file implementation that is essentially:

@contextmanager
def open_file(
    url: str,
    mode: str = "r",
    create: bool = False,
    buffering: int = -1,
    encoding: Optional[str] = None,
    errors: Optional[str] = None,
    newline: str = "",
    **options,
) -> typing.IO:
    writeable = True if "w" in mode else False
    dir_url, file_name = os.path.split(url)
    with open_fs(dir_url, writeable, create) as fs_:
        with fs_.open(file_name, mode, buffering, encoding, errors, newline, **options) as file_:
            yield file_

which now gives me fs.opener.errors.UnsupportedProtocol: protocol 'https' is not supported if I try it with an https url.

I guess there's a good reason pyfilesystem does not do this, but I thought I check out with you here first.

P. S. I suppose there's no way to list things which are under some http path.

dargueta commented 2 years ago

P. S. I suppose there's no way to list things which are under some http path.

No, which is probably why this isn't implemented in PyFilesystem. Most operations, such as listing and stat-ing would be unsupported.

You may find smart-open helpful.

althonos commented 2 years ago

Indeed, HTTP is not describing file systems like FTP so you cannot simply add an HTTPFS that explore all links as files, that's not exactly how it works. The thing that may work, however, would be to have a dedicated class that can handle particular file listing formats, like nginx or Apache let you configure to serve static content.

lurch commented 2 years ago

The thing that may work, however, would be to have a dedicated class that can handle particular file listing formats, like nginx or Apache let you configure to serve static content.

Huh, I never even realised that PyFilesystem didn't support that yet! :rofl: Adding a pyfs-interface on top of e.g. http://downloads.raspberrypi.org/ would be a fun project, which I unfortunately don't have time for myself at the moment. (might even be able to re-use some of the FTP-listing parsing code?)