Closed dv-anomaly closed 6 years ago
We did, at one point, try to enable concurrent access. Unfortunately the conclusion was that the required synchronization would be too expensive and result in too much of an overhead to enable concurrent WRITE access. Are you interested in concurrent write or just concurrent reads?
@ppolewicz was the main driver for this effort. Perhaps he can elaborate further?
It is fairly trivial to get B2Fuse to discard cache on demand, you could look into clearing the CachedBucket object (https://github.com/sondree/b2_fuse/blob/master/cached_bucket.py). This would essentially discard the current cached B2 listings and force B2Fuse to check in with B2 upon next request. This discards "listing of known files", but not actually open files. Open files are handled seperately by the B2FileDisk in consort with the main B2Fuse class.
Please be aware of the project B2CLI (https://github.com/Backblaze/B2_Command_Line_Tool) which this project is based on. This is used as an API to interact with B2 and does the heavy lifting on retry and multi-file upload etc.
As for a "replicated geo-caching filesystem", this was actually what I was trying to do when I started this project. However, as the project progressed, I found that there were a number of challenges that had to be tackled to allow for multi user access. As such, this project is now more or less devoted to being a "simple" FUSE driver/frontend for a bucket.
I would love to be able to use B2 in the way you describe, but the design of such a system has to be considered carefully if you wish to have decent performance. Perhaps such a system could be implemented on top of B2Fuse again? Pull requests and contributions are very welcome :)
@IamDH4 how specifically would you take care of notifying clients about changes done by other clients?
I have given this some thought a while ago and one thing is clear. B2 was not designed for this. While I managed to prove it is possible to achieve NFS-like "strong" consistency, That kind of filesystem would be very slow if it was using the tools b2 currently provides to achieve it. It's just not worth it.
Instead, you can just remount b2fuse after writing / before reading, right?
Well my current deployment thoughts are to use it in conjunction with a clustered/load balanced web based ui for users. I can hook every time a file is uploaded / changed. I can then throw that change into a message queue or simply check for a modification date in the database prior to accessing the content and on the other nodes and clear the cache for a single file as needed.
A front end like Nextcloud for example which I currently use a b2 fork of s3ql with. However because of the way it works it's rather difficult to achieve of multi-client setup. And it does not make files accessible in the traditional manner over B2.
With write cache and a way to clear a single file cache without restarting the filesystem would make this a better choice for such a deployment. We wouldn't want to restart the filesystem in the middle of a request, nor have to purge the cache of everything. Clearing the cache for a single file after it has changed is a much better solution, and a lot less expensive.
If you are interested in supporting all of this at the file system level so other users can have multiple B2Fuse instances running. Then I think what you need is a server component that can help sync metadata across nodes. Possibly some sort of websocket server, so you can push down changes to each client.
Since I would see benefit from such an implementation I would be willing to sponsor/host such a backend for users that do not want to or have the need to deploy their own.
On every file write we can send the required information over the websocket interface which will relay other hosts currently authenticated to the same bucket.
@IamDH4 server A deletes the file, sends the information to the service. Meanwhile server B wants to read the file, but due to network latency, it didn't get that information yet. Server B reads the file from the stale cache.
Using this design you cannot make a highly consistent filesystem, not even NFS-like consistent server. You could go for eventual consistency (if it is useful to you), but even then, design and implementation of such system is rather hard. Can't you just use two separate buckets, with server A having bucket of server B mounted with the cache disabled, and vice versa, so each server uses the cache only for the data that it wrote?
Alternatively, for your design, you can host a memcache server which will keep an entry for every cached file it has. When you want to invalidate an entry, delete the key from memcache and other clients will instantly find out that this cache is invalid. The difference is that due to usage of memcache, you can process like 60000 requests per second, while using a b2 persistent storage for locks gets one operation per a few seconds.
What I'm trying to say is that you doing this correctly is a major effort, that you probably cannot tackle by yourself and for your particular usecase it is not needed - there are simple solutions to this that you can use. And as for the service that you offered to host, people would have to:
so while I admire the courage, I suggest to use a simple design instead. If you'd like to contribute to the community and get your code used by a lot of people, then I suggest you look into help wanted
tickets in b2 CLI - there are quite a few issues there worth looking into.
I hope that helps, at least somewhat :)
You have completely missed the point.
I was never aiming for "instant" consistency, nor would many applications of this nature require it. Anyone who has ever deployed geo-replicated data services, like we have, understand this very well. You are never going to get that level of accuracy in such deployments, and you don't really need it. Even for your average home user, which is probably the only one who would use the hosted service, wouldn't need it. Just look at the latency that a lot of other sync solutions provide.
And for those that do want lower latency, there's no reason why they couldn't simply deploy the notification service on their own hardware side by side, as it would just be part of the file system.
Separate buckets obviously would not work for anything I am talking about here.
I think you are also confused about my offer:
It is in no way related to our core business. The offer is to facilitate a hosted version of the metadata sync service. It would be integrated directly into the B2Fuse file system.
What trust? Security of encrypted metadata packets? If someone wants to roll their own, let them. I'm not sure why it would be a big deal, it would be a core part of the tool.
There is no deployment "on my server". Yes there would be a cluster of servers handling meta data requests if you wish to offer a hosted version of the service. But again I fail to see your point here.
Metadata is not exactly a huge amount of data. Unless someone is abusing the service I don't foresee much of a problem. I can understand your concerns. Perhaps there may be a cut off somewhere eventually.
Your scenario doesn't exactly stand up either.
Scenario 1 Server A sends a message to the service, "I am deleting file xxx". All servers clear the cache and deny access to the resource. Server A deletes the resource.
Scenario 2 Server A sends a message to the service, "I am updating file xxx". All servers clear the cache and start serving the resource from upstream until the new file is detected. Server A finishes uploading the new file version. All servers will then cache said file when it is accessed for the first time.
We could go over scenarios all day, maybe someone wants a file to sync down as soon as it becomes available and not at access time, that is also trivial to implement. But the point is, such an implementation doesn't have to be difficult, and would make sense for many people.
You are obviously not interested in working on a solution to the multi-node problem because you think it is too difficult and that is fine. If it's not something the project is interested in we will do the R/D internally. Just thought I'd reach out to the developers of the project and see if there was interest.
Actually, I am most interested in any contribution or step (however small) towards using B2 with B2Fuse as a distributed file share.
From your discussion it seems what you need is a way for B2Fuse to gracefully discard the write cache of a file. This does however, not necessarily let B2Fuse show a new or changed file instantly, as there is also a cache for directory listing. The directory listing is per bucket and not per file, so in order for B2Fuse to "see" the new file the entire cache would have to be purged.
So in short you need:
def purgeFileCache(filename)
Where the expected behavior is that when called next request will return an upstream version of said file?
Your scenario doesn't exactly stand up either.
Scenario 1 Server A sends a message to the service, "I am deleting file xxx". All servers clear the cache and deny access to the resource. Server A deletes the resource.
Scenario 2 Server A sends a message to the service, "I am updating file xxx". All servers clear the cache and start serving the resource from upstream until the new file is detected. Server A finishes uploading the new file version. All servers will then cache said file when it is accessed for the first time.
We could go over scenarios all day, maybe someone wants a file to sync down as soon as it becomes available and not at access time, that is also trivial to implement. But the point is, such an implementation doesn't have to be difficult, and would make sense for many people.
You are obviously not interested in working on a solution to the multi-node problem because you think it is too difficult for you to work on and that is fine with me. If it's not something the project is interested in we will do the R/D internally. Just thought I'd reach out to the developers of the project and see if there was interest.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sondree/b2_fuse/issues/56#issuecomment-317589010, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLAWXiydetgbbxjnspUuV71E1AGZV2rks5sRTAlgaJpZM4OgYuL .
@IamDH4 you need to understand that I am not a b2_fuse developer. I'm just a guest here, this is @sondree's project.
In the scenario that you mentioned, server A sends a message to the service saying that it is going to delete or update a file, then it faces a catastrophic issue of some sort (like a power loss, or hdd failure) and thus forgets what happened. In current b2 design, no sever can ever assume that this file will not be uploaded at some point in the future, so the file cannot be cached ever again. Server A can also experience severe CPU steal or process deprioritization, which will keep the socket open on the kernel level, but fail to execute any python code for an undetermined amount of time. Eventually it will get itself unstuck and finish the upload, potentially hiding a newer version of the file which was uploaded in the meantime. Those issues are quite rare, but they happen and a reliable filesystem needs to be aware of them or it will fail to uphold the consistency rules it advertises.
There are probably more scenarios that I can point out (if you are interested) where this design will return data inconsistently. This can lead to data loss if the application doesn't expect such behavior (and I don't know how to make it aware of it, other than implementing a consistent filesystem with a journal or something).
This project is just enough hard for me and I hope to work on something like this one day and eventual consistency is an option if you can actually reliably deliver it. But your proposed design will not achieve eventual consistency reliably. I'm pointing it out for your convenience - you can implement it nevertheless and maybe you won't ever hit this issue, or even if you will, the corrupted backup will get overwritten the next day and nobody will notice...
"then it faces a catastrophic issue of some sort"
Which also means it loses its connection to the service... The service notifies all the other servers to roll back their changes. Seriously it's not a big deal... There is always a solution.
How do you figure eventual consistency will not happen? Because of a couple comments on github? And no actual design or code?
@sondree So it is currently not possible to clear the directory listing cache and cache for a single file? Clearing the entire cache on every write operation is obviously not ideal. I haven't had a chance to look at that part of the code yet.
Cache for a single file (as in the actual content of a file), yes. Directory listing for a specific file, no.
B2Fuse requests a list of all the files in a B2 bucket at once, this is what is cached as part of the directory listing. This includes meta-data about each file, such as filename, filesize etc. On the other hand, I am not sure you need to delete this cache, as this contains no file data, only meta-data.
Well I think we probably want to update the meta data too. I understand that's the way that backblaze provides the metadata. But we don't need to clear all cached data? Just the directory listing cache for all files.
I would think we could just update the directory listing cache and remove the cache for the specified file. Leaving all the other files data cache intact.
I see two cases that needs to be handled:
File updates remotely. B2Fuse is notified file is stale and purges cache for file. In order to serve the new file either entire directory listing has to be updated again, or another entry has to be repopulated for the new updated file.
File deletion remotely. B2Fuse is notified file is deleted, purges cache for file. There will be no new file to serve from B2. Done.
Unless you have 10000s of files, refreshing the bucket file list isn't actually that expensive. Special casing for the update of a single file may be possible, but would have to be evaluated against the added code complexity.
You should have a look at https://github.com/sondree/b2_fuse/blob/master/cached_bucket.py
This is the cache in question. As you can see, only one call is actually cached, the def ls(...)
. Anything else goes straight through to B2. B2Fuse will also drop cache on any updates to files def upload_bytes(..)
or deletions def delete_file_version(..)
I explored the possibility of implementing a (strongly, eventually or NFS-like) consistent multi-client file storage using b2
as backend. I spent a few weeks on this, discussed the internals of the b2 API with b2 server developer well beyond anything that is documented, also considering requesting changes to the API so that we actually can make better guarantees.
I also have over 3 years of commercial experience building backends for distributed storage appliances... So I think I may have some insight on the subject.
It is already clear that the discussed design revolves around a notification system / message queue / websocket and it is clear that it won't work. I suggest that you consider the design carefully before starting the implementation. The most troubling issue is an upload which can finish unexpectedly at any time on a client affected by CPU starvation.
This seems to work very well. But I would like to deploy this in a way that two clients are accessing files on the same bucket. If client A changes to or deletes a file, client B should know about the change and refresh it's cache.
I can handle notifying each client connected to the bucket. The problem is what would be the best way to force refreshing the cache of a particular file? Deleting a files from the
--temp_folder
causes the file system to break and I have to restart it. No errors are thrown when this happens. I imagine when it sees the cache is missing, it raises an error, and doesn't successfully recover.It would be nice if we had a way to enable Write Cache support like s3ql does. It would be a successful write when it finishes writing to cache, and let the file system finish syncing the data in the background. It may also be useful to be able to setup some sort of callback after the cached file has finished it's upload to the bucket. With the way Backblaze handles versioning, conflicts shouldn't be an issue if multiple clients happen to write to a given file at the same time.
I am possibly considering extending the file system to have some sort of api to access functions like clearing the cache for a file. It could make for a great framework to build a replicated geo-caching filesystem, with low-cost archive storage on the backend. Not sure if an API would be beyond the scope of this project / should be a fork?