PyFilesystem / pyfilesystem2

Python's Filesystem abstraction layer
https://www.pyfilesystem.org
MIT License
1.99k stars 177 forks source link

Drop-in compatibility of OSFS with Python's native path handling #512

Open garcia opened 2 years ago

garcia commented 2 years ago

Here's a heavily-simplified example of the API I want to offer my users:

def opendir(path: str, fs: Optional[fs.base.FS] = None):
    fs = fs or fs.osfs.OSFS()
    for subpath in fs.listdir(path):
        pass

The idea here is that users of my API could pass a native OS path with no fs parameter to inherit Python's standard path functionality, or they could specify e.g. fs=ZipFS("foo.zip") and a path within the ZIP for more advanced use cases. Currently, this API exists as a stable release without any fs abstraction, so it's important to preserve existing path semantics to avoid a major version bump.

Unfortunately, this hypothetical API doesn't work for a few reasons. Most notably, OSFS requires a root_path parameter, which acts as both a sandbox and a dividing line between system paths and PyFilesystem paths (as documented here). If I default to fs.os.OSFS("/"), the API will work for absolute paths, but not for relative paths. If I instead default to fs.os.OSFS("."), I get the opposite affordances, except relative paths outside the current working directory still won't work due to the sandbox. This friction is compounded on Windows due to drive letters and the backslash separator.

The obvious workaround here is to pass path into the OSFS constructor and set the path within that filesystem to /:

def opendir(path: str, fs: Optional[fs.base.FS] = None):
    if fs is None:
        fs = fs.osfs.OSFS(path)
        path = "/"
    for subpath in fs.listdir(path):
        pass

...except now if my user explicitly passes an OSFS, the path's semantics will be different:

opendir(r"C:\Users\micro\Documents\example") # ok
opendir(r"Users\micro\Documents\example", fs=fs.osfs.OSFS("C:\")) # throws fs.errors.InvalidCharsInPath

This also complicates chaining within my API (which is obviously more complex than this one trivial function) and will probably lead to confusion when the paths that come back from my API have different separators than the ones that were passed in.


I understand there were two mutually incompatible options for this library: make paths consistent with the end user's expectations, or make paths consistent with each other across platforms for ease of programming. I can empathize with the decision to go with the latter option, but this seems like a lot of friction just to expand an existing, os-based API to support non-OS filesystems.

Can I convince OSFS to always work in system paths? Maybe as a subclass, or just a separate FS implementation? My ideal OSFS API would take no root_path and simply pass system paths down to Python's os layer, but it's hard to tell how much work that would be, or whether it's even possible or advisable (one hitch I've noted is that fs.getsyspath doesn't appear to have an inverse function).

(For the sake of ending on a positive note: I really appreciate the thorough and well-organized documentation, which made understanding the root cause of this friction much easier!)

garcia commented 2 years ago

Went ahead and started on a proof-of-concept NativeOSFS implementation for my use case: (gist). Most of the code was just Frankensteined over from OSFS with path conversion stripped out.

I've only spent like 20 minutes on this and didn't write any tests, but my library's unit tests are back to passing with this class as the default FS, so that's cool. If there's interest in a PR for this, I'd be happy to spend some more time fleshing out write functionality & tests, but I also acknowledge that this sort of goes against pyfilesystem2's design philosophy, so, your call :)

lurch commented 2 years ago

just to expand an existing, os-based API to support non-OS filesystems.

That's simply not how PyFilesystem is designed to work - you can't mix os paths and functions with fs paths and functions; they're fundamentally two different beasts. E.g. following on from your opendir(r"C:\Users\micro\Documents\example") # ok example, what should

mem_fs = MemoryFS()
opendir(r"C:\Users\micro\Documents\example", fs=mem_fs)

be expected to do? And as you mention: "will probably lead to confusion when the paths that come back from my API have different separators than the ones that were passed in" ! I think you either need to stick to using only os paths and functions, or fully-embrace fs paths and functions - and yes this would mean breaking backwards-compatibility. All IMHO, of course :wink:

This request feels kinda like a duplicate of #238 ?

garcia commented 2 years ago

I did see #238 while researching this issue. Both this issue and #238 concern compatibility with built-in Python modules, but different modules, with different underlying motivations.

E.g. following on from your opendir(r"C:\Users\micro\Documents\example") # ok example, what should

mem_fs = MemoryFS()
opendir(r"C:\Users\micro\Documents\example", fs=mem_fs)

be expected to do?

I wouldn't expect this to work. You would still have to use PyFilesystem paths for nonnative filesystems, and operations like listdir on such paths would also return PyFilesystem paths, preserving user expectations.

I think you either need to stick to using only os paths and functions, or fully-embrace fs paths and functions - and yes this would mean breaking backwards-compatibility. All IMHO, of course 😉

The reality of API design is a long story of balancing backwards compatibility with simplicity. I admire the aesthetic of a nice & clean API with simple invariants, but fully embracing PyFS requires me to reconcile with the practicality of Windows users pasting paths into their console to explore my library's functionality. So, I'll probably just continue to develop my NativeOSFS hack as a 3rd party implementation for now.

Appreciate the feedback! Feel free to close if there's nothing actionable here.

lurch commented 2 years ago

Perhaps a "cleaner" approach would be to always use pyfilesystem methods and paths throughout your library (and thus avoid your NativeOSFS hack), and simply provide a wrapper which converts things to-and-from a "system path" for users who want to do command-line experimentation (using native paths)? :shrug:

garcia commented 2 years ago

converts things to-and-from a "system path"

This is why I pointed out that getsyspath doesn't appear to have an inverse. I would be interested in a solution that simply translates system paths to PyFS paths for OSFS, but I couldn't find a way to do that reliably, and I suspect drive letters would make it especially thorny.

In any case, my library isn't doing anything particularly fancy at the filesystem level - it just needs to find some files in a nested directory structure and return their paths / read their contents. Most of my users will likely stick to the OS filesystem, but I don't want this logic to be coupled to a particular FS for those with more advanced use cases. So, either the simple use cases or the advanced use cases will need to bear the friction of Windows paths being weird. I would rather shift that burden to the advanced use cases, since those users are more likely to be familiar with this kind of impedance mismatch.

lurch commented 2 years ago

I would be interested in a solution that simply translates system paths to PyFS paths for OSFS, but I couldn't find a way to do that reliably, and I suspect drive letters would make it especially thorny.

I suspect that the only way to do that reliably would be to only translate system-paths that are in a sub-directory of the OSFS's root directory, and throw an Exception (or create another OSFS?) otherwise? :shrug: I.e. if you want to support multiple drive letters you'd probably need a dict of OSFSes?