btrfs / btrfs-todo

An issues only repo to organize our TODO items
21 stars 2 forks source link

File History Time Machine Functionality #57

Open glowingwire opened 4 months ago

glowingwire commented 4 months ago

My end use is: right click on a file in dolphin, click versions, see a list of versions saved.

This has been described as a major/involved feature to implement by people I talked to.

I imagine it as an optional feature for a subvolume, probably used on /home by default.

I imagine the system would keep all the parts on every file write, and create a new description in a secondary table, or in the main table and then create a mask that hides the old versions from normal viewing of the files. so there would be many inodes, and some would be hidden. another table would list the inodes and associate them with the current file. It would be cool if fragments of files are compared to see if they need to be rewritten, so only the diff is written... this may be existing behavior, but then saving repeatedly won't use a lot of disk, and solid state drives won't be worn out.

then there would be some new interface that would allow user interfaces to see the older versions and pull up the file. I imagine all the metadata would be available to the program.

This is an important feature for mainstream use of Linux.

kakra commented 4 months ago

If you create scheduled snapshots of said subvolume, you already have that functionality. But Dolphin misses an interface to easily browse those snapshots based on the folder or file you selected. I think SuSE already implements this with snapper and a distribution-specific Dolphin plugin which seems not to be available outside of SuSE.

Snapper can also create diffs of files and folders to see what changed, and revert changes.

That said, since btrfs already implements the base functions, you request should go to KDE. It's not btrfs' job to implement a Dolphin plugin for browsing snapshots.

glowingwire commented 4 months ago

I was expecting not to create a snapshot of the whole subvolume, I was hoping to deal with it per file.

So I would have to create a sub volume for every users Documents and Desktop, and save a snapshot every time a user changed a file.

I am playing with Suse Tumbleweed, but haven't even begun to figure out how to recover a file.

I intend for a feature that is on by default in a few users folders, and not requiring an administrator to be involved.

This does not seem like a simple thing, because what happens when a user changes permissions on a file, do all of the versions get changed permissions too? (I would set them to owner read only, group and everybody can't read)

and then what to do if the user moves the folder or changes owner (I would set the older versions to the owner at the time it was created)

and if someone drags it to the trash, there are things to figure out.

On Mon, Apr 29, 2024, 04:27 Kai Krakow @.***> wrote:

If you create scheduled snapshots of said subvolume, you already have that functionality. But Dolphin misses an interface to easily browse those snapshots based on the folder or file you selected. I think SuSE already implements this with snapper and a distribution-specific Dolphin plugin which seems not to be available outside of SuSE.

Snapper can also create diffs of files and folders to see what changed, and revert changes.

That said, since btrfs already implements the base functions, you request should go to KDE. It's not btrfs' job to implement a Dolphin plugin for browsing snapshots.

— Reply to this email directly, view it on GitHub https://github.com/btrfs/btrfs-todo/issues/57#issuecomment-2082471335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOCOONFKLJ3GWUJ6RJ6UHTY7YVBBAVCNFSM6AAAAABG5L2VYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGQ3TCMZTGU . You are receiving this because you authored the thread.Message ID: @.***>

glowingwire commented 4 months ago

I also expect it to be every time the file is written.

On Mon, Apr 29, 2024, 19:20 Aaron Peterson @.***> wrote:

I was expecting not to create a snapshot of the whole subvolume, I was hoping to deal with it per file.

So I would have to create a sub volume for every users Documents and Desktop, and save a snapshot every time a user changed a file.

I am playing with Suse Tumbleweed, but haven't even begun to figure out how to recover a file.

I intend for a feature that is on by default in a few users folders, and not requiring an administrator to be involved.

This does not seem like a simple thing, because what happens when a user changes permissions on a file, do all of the versions get changed permissions too? (I would set them to owner read only, group and everybody can't read)

and then what to do if the user moves the folder or changes owner (I would set the older versions to the owner at the time it was created)

and if someone drags it to the trash, there are things to figure out.

On Mon, Apr 29, 2024, 04:27 Kai Krakow @.***> wrote:

If you create scheduled snapshots of said subvolume, you already have that functionality. But Dolphin misses an interface to easily browse those snapshots based on the folder or file you selected. I think SuSE already implements this with snapper and a distribution-specific Dolphin plugin which seems not to be available outside of SuSE.

Snapper can also create diffs of files and folders to see what changed, and revert changes.

That said, since btrfs already implements the base functions, you request should go to KDE. It's not btrfs' job to implement a Dolphin plugin for browsing snapshots.

— Reply to this email directly, view it on GitHub https://github.com/btrfs/btrfs-todo/issues/57#issuecomment-2082471335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOCOONFKLJ3GWUJ6RJ6UHTY7YVBBAVCNFSM6AAAAABG5L2VYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGQ3TCMZTGU . You are receiving this because you authored the thread.Message ID: @.***>

Zygo commented 4 months ago

It's relatively straightforward to set up a script that waits for file modifications with inotify, triggered by CLOSE_WRITE,MOVED_TO,DELETE events, which commits files to git. Then you can browse file versions with any git repo browser or plugin. This scales well for individual project folders, and it stores modifications with delta compression which can be significantly more efficient than snapshots or reflinks, especially for file formats where the entire file is rewritten every time. It doesn't store file permission modifications, but you can get extensions for git that handle those. I have one of these running on almost every project I work on in the last two decades or so--even before I started using btrfs. A script like this seems to check all of the boxes plus a few more: separate permissions, automatic updates, full file revision history, no admin required, tools to generate diffs and various reports, delta compression.

This can be extended a little with some help from btrfs: instead of committing directly from the working directory, cp --reflink the files from the user's working tree to the git tree and commit them there. That gives atomicity for file updates. For really big projects, or projects involving multiple database files that all have to be modified together, the script could snapshot the entire project (or even the entire /home), commit the modified files, and delete the snapshot; however, this can be very heavy for small projects. There's a size below which the cp --reflink is better and a size above which btrfs sub snap is better.

The problem with doing this at the filesystem level is that it's difficult for a filesystem to tell when an update begins and ends, especially for random-access file formats. A user's file revision history might contain 20 partially updated files for every complete file written to the filesystem, simply because the application writes the file in 20 segments. With the inotifywait approach, the triggering event is CLOSE_WRITE which indicates that the file has been closed, and in many cases can be considered complete.

Another problem is that some applications update their files constantly as users work with them. I've found with my git inotify script that I have to provide guard times--intervals where the script waits for the file to stop being modified before saving a copy of the file. As a user, it's simply not productive to have every version of every file--5 or 10 versions per minute are plenty for editing at human speeds. If your use case is tracking a program as it makes modifications to a file one at a time, it might be better to make a fuse filesystem to do that logging explicitly.

With current btrfs, deduplication, reflinks, and snapshots all force data to be flushed out of the page cache to disk, so they are quite heavy writing workloads. btrfs isn't particularly efficient at storing large numbers of data file updates compared to DVCS tools like git which are specialized for this use case. Changing that while maintaining btrfs compatibility would be a major undertaking--it would likely be easier to start over with a new filesystem.

glowingwire commented 4 months ago

Amazing information. Thank you! I am attempting to take it all in. I don't want the user to have to think about this feature, and be relieved when they can go back and recover something that they accidentally damaged.

Academically, I'd like to be able to work with files that are being edited all the time, but I assumed programs did this in their temp directories which I would not backup this way, I expect programs to not clobber files until they are told that it is ok to do that. Perhaps limiting my imagined feature to Documents and Desktop folders takes care of this, because the /home/user/ folder can have all kinds of other uses.

And yet there is nothing stopping a person from having their program append to logs in /home/user/Documents/tmp or write over parts of the middle of a file, or do other things like that. I have an expectation that I can open a document and get back to the way it was until I hit save-- just by closing the program I am using. I believe this could go beyond just an expectation, but be a demand that programs behave this way.

With all these silly trackpads with tap enabled, I have accidentally moved blocks of text and clobbered work so bad that undo didn't help. I could just close the program or say Save As Different Filename and reopen the original.

I may also be tainted by misunderstanding blob storage. I have heard that perkeep can deduplicate files that share common information, and just differ in a small area. I think they do this by indexing every block by it's hash, and allowing more than one entry to point to that block.

I was expecting a file to rewrite only a few blocks in the middle if it was only changing part of the file and keeping most of it the same, but I could see how a program could simply just re-output everything every time.

What I am expecting this feature to provide is v maybe 5 revisions, over maybe a week back.

I was not considering it to be a primary backup.

I was thinking that btfs basically writes a diff to a new block, and keeps the old block, but I think it is more like the replaced block gets written, and then the old block gets marked cleared, and the table then references the new block, and the file might be fragmented a bit, but that's ok, we're running on SSDs.

I wonder how Apple and Microsoft implement this feature.

Perhaps it makes sense to do a snapshot of the Documents folder every time a program closes the write.

kakra commented 4 months ago

I wonder how Apple and Microsoft implement this feature.

At least Windows does scheduled filesystem snapshots if you refer to the "previous versions" feature. It may be possible that this can be filtered by sub directory tree (so it won't keep copies of non-related files). It's not per file, although presentation of the feature through the GUI may pretend that. It uses view filtering to only list snapshots when the files actually changed. The snapshots are stored in "System Volume Information" and I think, you can even mount them as a virtual drive or export to a VM image - at least for full VSS snapshots this is true. The shadow copy feature probably works a little more granular but uses most of the same infrastructure, and does auto-cleanup if free space becomes low. But it doesn't do that on each change to the file.

Apple's Time Machine works in a similar way but it writes backups of changes to a repository, so it probably works more like the Git idea of @Zygo. I don't know if classic MacOS file systems support snapshots. The later ones should do because they have copy on write features.

You could probably get something similar if you use or develop some fuse daemon that transparently mounts over the interesting folders. Maybe these could be a starting points: