PyFilesystem / pyfilesystem2

Python's Filesystem abstraction layer
https://www.pyfilesystem.org
MIT License
1.98k stars 175 forks source link

Support for a FileSystem using SQL DB as a backend #498

Open AbdealiLoKo opened 2 years ago

AbdealiLoKo commented 2 years ago

I've been using pyfilesystem2 a lot in my API projects, and it has been great. Handles so many cases that are useful.

One of the things I have been thinking about is whether it is possible to support a RDBMS as a backend. This may not be ideal but has some benefits ...

Requirement:

In some of my projects, I need a reliable storage mechanism that is available across multiple machine (as I have multiple API servers to enable HA + Fault Tolerant setups).

Current solutions

I currently have to get a NFS or S3 or SSHFS kind of storage in such places, which is what I use right now.

Proposed solution

Now, in majority of API servers, we already have a RDBMS where we store information. And for small files, this is a pretty decent approach for quick access and change Ref: https://www.microsoft.com/en-us/research/publication/to-blob-or-not-to-blob-large-object-storage-in-a-database-or-a-filesystem/ This could use sqlalchemy as a dependency so that there is abstraction on the exact database nuances.

Note: This could also be a 3rd party filesystem I may create when I get some free time too. But thought I'd post it here in case there is interest from other people involved in this project, or if others have some workarounds already available

lurch commented 2 years ago

Interesting idea... but as pyfilesystem provides a "full" filesystem API, my gut feeling is that this kind of approach would have a lot of corner-cases, and it'll be a lot of work to sort out all these corner cases. (Fortunately, pyfilesystem also includes a reasonably comprehensive test-suite). So good luck if you do decide to tackle this :smiley:

willmcgugan commented 2 years ago

I think the idea has been floated before. Not sure if any work has been done on that.

It's certainly doable. But @lurch is correct, you can probably get something up and running surprisingly quickly, but it will take longer to sort out the details and make it behave identically to other filesystems.

dargueta commented 2 years ago

You might be interested in dCache (but beware the AGPL license).

damc commented 2 years ago

I second this idea. Something like that would be useful to me as well.

"It's certainly doable. But @lurch is correct, you can probably get something up and running surprisingly quickly, but it will take longer to sort out the details and make it behave identically to other filesystems."

Most people probably don't need the details (personally - I don't, for me only reading and writing files would be enough). I don't know how PyFileSystem works exactly, so I don't know if it's possible to support this without the details.