Doloops / mcachefs

mcachefs : Simple filesystem-based file cache based on fuse
64 stars 15 forks source link

Remote changes not synced #9

Open isarandi opened 5 years ago

isarandi commented 5 years ago

If a file changes on the remote end (e.g. sshfs), it's not shown through the mcachefs mount, making the program unusable unfortunately for cases when the remote end may change.

hradec commented 5 years ago

the author state this in some other message here, actually.

mcachefs assumes the backend is not modified externally. The only modifications on it would come from mcachefs itself when one commits the journal.

That assumption is the reason mcachefs can be sooooooooooooo much faster than other cache filesystems out there, since once the backend is cached there's no need to consult it again, so there's no slow downs afterwards.... ever!

Off course the downside of that is that updates on the backend will never show up on mcachefs!

To solve that, I'm planning on write a extra thread (or a separated daemon) to continually check and sync already cached data with the backend in the background, completely separated from mcachefs main loop.

That way, mcachefs will receive updates from the backend, but without changing the speed and responsiveness of it.

This approach will have a small delay between the time the update happened in the backend and the time it was reflected down to mcachefs, but at the moment I believe it's the best approach to keep mcachefs in sync with a changing backend, without compromising the responsiveness of mcachefs!!

Having a separate sync thread/daemon also opens nice possibilities to make this syncronization even better... for example, we could have a daemon running on the backend server which would monitor the filesystem for changes (using kernels inotify, for example) and pushes a journal to said thread/daemon running on the mcachefs machine. That way, the sync thread/daemon doesn't need to "check" what has changed... it would just need to just commit the journal changes to folders/files that already being cached by mcachefs (and transfer files if needed as well).

vnicolici commented 5 years ago

What would happen if I try to use multiple instances of mcachefs on multiple servers, sharing the same backend? If I modify a file through one of the servers, I assume the change will reach the backend.

But what will happen next, if the other servers already have the previous version of the file cached. Will those caches be updated automatically when the other servers try to read from the file?

hradec commented 5 years ago

Nope... mcachefs assumes that all the changes in the backend come from mcachefs itself.

If another mcachefs in another machine make changes in the backend, mcachefs will not see then, since mcache doesn't refresh the metadata.

One way to force it to refresh at least the directory information (to see new and deleted files/folders) is to send a flush_metadata action to .mcachefs/action, like so:

echo flush_metadata > /cachedfolder/.mcachefs/action

but if you have already cached files that have being modified in the backend, I don't think it will re-download then... no sure...

uudruid74 commented 4 years ago

To solve that, I'm planning on write a extra thread (or a separated daemon) to continually check and sync already cached data with the backend in the background, completely separated from mcachefs main loop.

Is this a work in progress? I'm currently using a really odd stack with google-drive-ocaml-fuse mounting my Google drive when my wifi connects to the internet and I update journal and flush metadata on mcachefs at that point. I also have another layer that overlays mcachefs copy of google drive over my home directory. Writes to the overlay are written to the cache (except Downloads which write local but there is still an overlay to merge my google drive).

Once mcachefs starts syncing I'll see how to get overlayfs-fuse to maybe do the same. Right now google drive changes out of my laptops control basically aren't seen at all, but I just set this all up tonight and its quite nicely extended my home directory into the cloud which keeping recently used files local. Also, not sure when the deletion policy (cache clear) will be done.

hradec commented 4 years ago

Kinda... Since I don't have much time to work on this, it's being stale for a while. In our use case of mcachefs, it's beneficial that a remote filesystem changes are not reflected realtime on mcachefs, since we use it to process data on the cloud. In our case, mcachefs acts as a snapshot of the data, so the processing can happen without being affected by changes in the backend, which is what we want!

But there is the cases where we have to stop an ongoing processing, e just delete all cloud instances and start over, in case we need to get up2date data from the backend, which is kinda of a hassle.

But since it's a 1 in 10 chances of that happen, It's acceptable to just restart everything. So the priority to work on the extra sync thread idea is very low right now! Sorry..

On Sun, Mar 1, 2020 at 4:50 AM Evan Langlois notifications@github.com wrote:

To solve that, I'm planning on write a extra thread (or a separated daemon) to continually check and sync already cached data with the backend in the background, completely separated from mcachefs main loop.

Is this a work in progress? I'm currently using a really odd stack with google-drive-ocaml-fuse mounting my Google drive when my wifi connects to the internet and I update journal and flush metadata on mcachefs at that point. I also have another layer that overlays mcachefs copy of google drive over my home directory. Writes to the overlay are written to the cache (except Downloads which write local but there is still an overlay to merge my google drive).

Once mcachefs starts syncing I'll see how to get overlayfs-fuse to maybe do the same. Right now google drive changes out of my laptops control basically aren't seen at all, but I just set this all up tonight and its quite nicely extended my home directory into the cloud which keeping recently used files local. Also, not sure when the deletion policy (cache clear) will be done.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Doloops/mcachefs/issues/9?email_source=notifications&email_token=AAGE6VZWPDMZ6C5Z7R6L3YDRFJK2VA5CNFSM4G5DPSJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENM6HOI#issuecomment-593093561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGE6V7YKWL62US7HASEDB3RFJK2VANCNFSM4G5DPSJQ .