BTRFS subvolumes treated as external filesystems

aardbol commented 6 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines.

Desktop computer that backups to slave HDD

Your borg version (borg -V).

1.1.6

Operating system (distribution) and version.

Antergos up-to-date, rolling release distro

Hardware / network configuration, and filesystems used.

BTRFS

How much data is handled by borg?

200GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create --one-file-system ... I can't remember the exact full command exactly because it was in a script that I wrote, which I can't recover due to data loss.

I remember that I used the above in combination with exclude /tmp and exclude /var/cache/pacman/pkg I also used encryption and zstd compression, level 6

Describe the problem you're observing.

Borg skipped all my btrfs subvolumes during bavkups due to the --one-file-system argument. I thought it was skipping external filesystems only as is described in the documentation. Well it now seems that it skipped even btrfs subvolumes like my /home folder in all my backups.

I was surprised to find it out now because the total amount of back-up size, not taking into account compression and deduplication, was around the size of my partition, so I thought it was a complete back-up. Unfortunately now I found out that in all those back-ups, the /home folder was ignored...

Note that btrfs subvolumes are also handled by fstab, so they are mounted at boot time

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, use --one-file-system

mlbarrow commented 6 years ago

First, I am sorry to hear that you lost data. I have been where you are and it is quite an awful experience.

That being said, understand that btrfs subvolumes are, in fact, separate filesystems. For example, you'll notice that you can't remove a subvolume using rm. Is there a reason why you were using the --one-file-system argument for this particular backup job?

There was discussion on #2141 some time back to get the documentation where it is currently. Do you have a recommendation of what would make it more clear?

Also, you should regularly perform test restores to make sure your backups are working properly. At a minimum, do a borg list repo::archive to see if it contains what you think it should. There could be other issues going on (hardware issues, filesystem corruption, something changing files during backup process) that could make it difficult/impossible to recover above and beyond neglecting to back up files in the first place.

ThomasWaldmann commented 6 years ago

This is the line in the code deciding about the --one-file-system property:

https://github.com/borgbackup/borg/blob/1.1.6/src/borg/archiver.py#L588

This is the doc string:

https://github.com/borgbackup/borg/blob/1.1.6/src/borg/archiver.py#L3199

I don't think this is a borg bug, it rather seems that btrfs subvolumes being different devices / filesystems was an unexpected property of these for you and you accidentally excluded them.

Feel free to do a PR if you have an idea about how to improve the docs or the code.

aardbol commented 6 years ago

You are right, I should've properly done backups, which means that I also should have done restore test from time to time. Unfortunately, I just based the reliability of the backup on the total size of the backup, the manual checks in the mounts and the reliability of the drive where the backups were on. So now I'm learning it the hard way that I should've this exact check. I still can't understand that /home was skipped though, because most of the files where in that folder and I noticed Borg also parsing that folder, based on the difference of the number of files and the total size between last backup and the current one at that time.

The reason that I used --one-file-system was to skip external mounts from being backed up in the the same job, mounts that are in /mnt and /run/media. That was a more efficient way than manually excluding all folders.

I think BTRFS subvolumes should be included with --one-file-system to avoid mistakes like that again for other people, because as far as I'm convinced off, in practise I think most of use would consider them as part of the main filesystem, just like /home could reside on a second drive instead of all / on one drive. Subvolumes are only separated from the root filesystem (and then mounted to it at boot time) to be able to exclude them from the snapshots to prevent data loss during rollbacks or to give them different mount options for performance, stability of reliability reasons.

I'm going to look into the code which you provided links for and try to PR a change myself, hoping that it would be supported.

level323 commented 6 years ago

I'm going to look into the code which you provided links for and try to PR a change myself, hoping that it would be supported.

I would oppose such a change. To my mind (and workflow) the way borg currently handles --one-file-system is the best and most consistent. A change that treated btrfs subvolumes as a "special case" would require writing btrfs-filesystem-specific code, which itself is a hint that this is probably not the right direction to go.

Further, if you're going to propose this change for btrfs, what about other snapshotting filesystems like ZFS?

Also, BTRFS subvolumes are more flexible than you indicate. For example, subvolumes can be nested. They can be nested at arbitrary positions in the filesystem tree. However (and arguably providing further clues that the current approach taken by the borg code is correct) is the fact that creating a snapshot of a subvol that itself contains other subvols will not snapshot those child subvols.

Numerous machines I administer run on (and boot from) a single btrfs partition with various subvols (e.g. os1, os2, os1-home, os2-home, user-documents, etc). Backing up multiple subvols in a single borg command simply requires listing the path of each subvol in borg create. One can use the --one-file-system argument in this case and borg will correctly traverse within source subvol(s) specified on the borg command line but not traverse into any filesystems mounted within that subvol, which is what I believe you want. So all you need to do to achieve the effect you originally intended is to change the arguments you supply to borg that define the backup source(s).

Alternatively, for maximum data consistency when backing up a running system, one can snapshot the subvols to be backed up, have borg backup those snapshots, then delete the snapshots. In this case, the snapshots will not contain any mounted filesystems within them, so --one-file-system is not even needed in this case.

All the best

aardbol commented 6 years ago

level323

I don't think you should consider this as being a special case by implementing specific code to handle this differently as how it's currently done. I think one should look at this as the whole BTRFS partition being one filesystem, as the subvolumes (hence "sub"volume) are also part of that same filesystem (and partition), but are connected to it with different features for reasons already mentioned or separated from it (behind the scenes) on purpose to be able to create scopes for snapshots so that rollbacks don't cause data loss of data that shouldn't be part of that rollback, like logs or data in /home when you just want to rollback a system upgrade that caused corruption.

This specific (the latter) use of snapshots is one of the most commonly used. SUSE (the BTRFS experts) does it and documents it very well on why specific subvolumes are created, Ubuntu does it with the / and /home subvolumes (they don't even implement a single mount option in their setup) and Synology does it for that reason (and only providing very limited supported mount options).

Also, BTRFS subvolumes are more flexible than you indicate. For example, subvolumes can be nested. They can be nested at arbitrary positions in the filesystem tree. However (and arguably providing further clues that the current approach taken by the borg code is correct) is the fact that creating a snapshot of a subvol that itself contains other subvols will not snapshot those child subvols.

This should have been clear that I meant this by saying /home is a subvolume in my / path. But comparing behaviour of BTRFS snapshots to backups is where you are incorrectly mixing two different things together. It's the same like saying RAID is also a backup. It isn't, it helps you against hardware failure of a limited number of drives. Snapshots protect your from unwanted changes of that same filesystem. Backups, if done correctly, protect you from both of these and even more.

If you look at the os.path.ismount(path) function in Python, it should behave correctly (the case I'm defending) with BTRFS subvolumes, because they are the same inode and treated as such by this function, so it shouldn't be hard to implement. This post to confirm my case more: https://unix.stackexchange.com/questions/345471/btrfs-same-inode-number. Both of these rather confirms the behaviour of Borg being illogical rather than your way of working or comparison between snapshots and backups to be correct.

ThomasWaldmann commented 6 years ago

Are you saying os.path.ismount(your_btrfs_subvolume_mountpoint) returns False?

ThomasWaldmann commented 6 years ago

https://github.com/python/cpython/blob/v3.7.0/Lib/posixpath.py#L190

level323 commented 6 years ago

@IsaakGroepT

But comparing behaviour of BTRFS snapshots to backups is where you are incorrectly mixing two different things together.

Either you misunderstood me or I was unclear, because I'm absolutely aware of the fundamental difference between snapshots and backups. I think this confusion started because you weren't clear in your original post that you were using nested snapshots. I assumed that you were using a snapshot structure like Ubuntu does (subvols not nested and hanging off the btrfs root directory) and therefore you were backing up a live/running system that also had various mount points within the subtrees of each btrfs subvolume and you were using --one-file-system to prevent borg traversing other non-btrfs mounts (such as /proc, /dev and perhaps network shares etc). Hopefully that helps you to understand my reasoning behind suggesting you backup temporary snapshots of your subvolumes, because I thought you were backing up a live/running system.

Now you've clarified that you're using nested subvolumes, I better understand the issue you're wanting to discuss.

In any case, my central point stands - you can list each subvolume individually on the boirg create command line and borg will dutifully back up all the files in each subvolume. If you want to automate this further it shouldn't be difficult at all because btrfs subv list ..... is there to help you script an automated way to scoop up backups across all your subvolumes, even if you you're regularly creating/moving/renaming them. See also man btrfs-subvolume

So via btrfs subvol list .... the functionality you desire (backing up all subvolumes) is already possible and can be fully automated with a few extra lines of scripting.

In any event, if the present borg docs are unclear, then by all means let's clarify them so that no one else misses this important point.

All the best

aardbol commented 6 years ago

Are you saying os.path.ismount(your_btrfs_subvolume_mountpoint) returns False?

Unfortunately it doesn't seem to be so simple. I tested it and my subvolume was detected as a mount by that function

plattrap commented 6 years ago

An observation is that the BTRFS subvolumes show a device major number of 0. Whether this is guaranteed or just coincidence I have no idea.

Perhaps add a flag --risky-treat-btrfs-as-one-device and change:

btrfs_risky = btrfs_risky_flag and st.st_dev & 0xff00 == 0 and restrict_dev & 0xff00 == 0
recurse = restrict_dev is None or st.st_dev == restrict_dev or btrfs_risky

This would treat all subvolumes as if they are the same filesystem as the start.

cmurf commented 6 years ago

Btrfs subvolumes are separate file trees, they aren't really a separate file system. Subvolumes share all the other trees: root, csum, extent, uuid, chunk, dev, etc. On the other hand, the inode numbering starts over in each subvolume, so a given inode number is not unique on a Btrfs volume where it is unique on a Btrfs subvolume.

Anyway, it's an understandable point of confusion for the user as well as for development. You probably wouldn't really want a backup to, by default, consider all subvolumes (which means including snapshots as they are just a pre-populated subvolume) on the file system for backups as it would lead to a lot of unnecessary duplication of data. The reality is Btrfs volumes can be sufficiently more complicated that treating it as if it's any other file system, and not have a least Btrfs specific warnings to the user for potentially ambiguous situations, is going to end up in misaligned expectations.

RonnyPfannschmidt commented 6 years ago

how about something else - instead of trying to figure out a way to handle this add 2 new tools

a) a warning that will print file-system boundaries that borg will not traverse into b) a way to include such boundaries into the backup set (for smaller amounts of subvolumes the normal inclusion mechanism of listing paths may suffice, for other setups, different mechanisms may be required anyway

aardbol commented 6 years ago

@cmurf I don't think duplication is an issue to take into account, because Borg has a proper deduplication system that handles this.

I think a warning would be a satisfying way to improve this situation, next to improving the documentation to mention this behaviour specifically. An extra option to exclude external storage devices and include subvolumes of the same filesystem would be very interesting too as an efficient way for more complex btrfs filesystems.

Since the ismount() function was not interesting, here's Python code that can tell you if a volume is a btrfs subvolume: https://github.com/efficiosoft/btrfs-backup/blob/master/btrfs_backup/util.py. The license is compatible

level323 commented 6 years ago

@IsaakGroepT: @cmurf's point about files in snapshots being assigned different inode numbers would mean that for a btrfs volume that has multiple subvolume snapshots (as is the case for users who use tools like snapper) a version of borg that traversed multiple subvols/snapshots would have a long initial run, re-reading large numbers of files that were actually identical but borg couldn't tell because the inode numbers were different. Subsequent runs would benefit from the cache though, as expected.

cmurf commented 6 years ago

@IsaakGroepT: I don't know what "proper" means in the context of deduplication, and I don't know how Borg's deduplication works. If I point borg at a path omitting --one-file-system I expect borg will have to read the contents of every file in that path in order to know whether files are duplicates. If I have 100 GiB of data, and in that same path 10 snapshots of that data, then borg is going to end up reading 1000GiB. That borg deduplicates the backup has no bearing on whether it has to read duplicated data due to the existence of snapshots. Consider an even more extreme case 1000GiB of data, snapshot 10 times. Now borg (or pretty much any tool including rsync) is going to read 10TiB in order to know only 1TiB is unique. That's gonna take days. So yes duplicated data resulting from snapshotting is highly relevant.

And therefore I think it would be entirely sane for borg developers to use --one-file-system on Btrfs volumes by default, with a warning that it's doing so. Separate documentation can explain the rationalization for doing this as, due to snapshots Btrfs volumes can appear to be huge even when 99% of the files point to shared extents. Why bother unnecessarily reading all of that data just to deduplicate it in the backup? That kind of volume replication is available with "Btrfs seed sprout" and more selectively with "Btrfs send receive" using Btrfs user space tools, but of course it requires a Btrfs destination. Open question if libbtrfs offers some API that would help borgbackup deduplicate on read, where shared extents in snapshots are turned into hardlinks on destination, and if not, what that would look like but I'd think both rsync and borg would benefit.

@level323 You definitely cannot infer much of anything about files and their inode numbers on Btrfs. The inode numbers in a snapshot created with btrfs sub snap command will match that of the orginal subvolume (this is why snapshot creation is so fast, there's almost no metadata writes with snapshot creation); whereas if I use cp --reflink, the inode number assigned depends on the destination subvolume and what the next available inode number is. If I use find -inum for inode on a Btrfs volume, I may get back a lot of different files and even dirs, and maybe some identical files that share the same data extents. It's just not knowable without looking at the extent addresses to see if they're shared or not.

With one exception, you can assume that an inode number is not used more than once in a given subvolume. The exception is subvolumes themselves always have inode number 256.

Another gotcha pertains to mounting subvolume with the -o subvol, or -o subvolid mount option. Behind the scenes these are bind mounts. So, depending on how things are assembled, that might pose curious behaviors, not least of which is it's possible snapshots are entirely invisible (not in any mount path).

aardbol commented 6 years ago

@cmurf Didn't read lol

J/k. That was interesting information. And that also shows the limitation of the use cases that I'm used to. But I still have an issue with the way --one-file-system works by default. I think it doesn't represent (or explain) correctly the differences between a real mounted system, where there is no discussion whether it's a foreign file system, and a btrfs subvolume of the same file system (or where there could be discussion). So instead of tweaking the code to include subvolumes by standard with that option, I think it's more interesting to rename that option and define its behaviour more precisely in the documentation, in relation to new generation file systems like btrfs (and zfs etc. if anyone is sufficiently experienced with it), so we can avoid the mistakes for other people that I recently experienced.

So we already discussed this issue thoroughly to avoid btrfs tweaks being added to the general behaviour of Borg. Shall we now propose pull requests with improvements to the documentation regarding to the potential issues/confusions with the btrfs file system?

ethchest commented 6 years ago

If I have 100 GiB of data, and in that same path 10 snapshots of that data, then borg is going to end up reading 1000GiB.

Afaik that is not how borg works. If it was, it would be much slower.

jdchristensen commented 6 years ago

If I have 100 GiB of data, and in that same path 10 snapshots of that data, then borg is going to end up reading 1000GiB.

Afaik that is not how borg works. If it was, it would be much slower.

Borg wouldn't know that the contents were the same, so it would have to read all 10 copies of the data. Because of deduplication, the repo wouldn't be 10 times larger, but it's still inefficient to read each file 10 times.

Subsequent runs would be fast, because of the files cache. But if a file changed in all 10 snapshots, then it would have to be read 10 times.

horihel commented 6 years ago

I agree with @cmurf and would like to add one thing: I always compare the behaviour to tar, because tar is a standard tool and its behaviour is typically what unix admins expect in other tools as well.

Tar treats btrfs subvolumes as seperate mount points and will skip them if --one-file-system is true.
Tar does output a warning to stderr if it skips a mount point.

eike-fokken commented 5 years ago

I'm just familiarizing myself with btrfs and stumbled upon this issue.

I sympathize with @aardbol in finding the status quo confusing but am not sure how to really fix it.

Why is it confusing? The option is called --one-file-system and my first thought is "stay inside btrfs or ext4 or whatever filesystem I am in.".

I think the best solution would be to rename the option to --no-traverse-submounts and change behaviour such that nested subvolumes are traversed. But unfortunately I don't know how to make that happen without writing btrfs- and zfs- custom code. I find the current behaviour surprising because nested subvolumes are on the same file-system (hence my renaming proposition above) and they don't feel like mountpoints (you never mount them), although in python os.path.ismount(subvolume-directory) returns true for them. Feeling is subjective, but what makes it really hard to find out that they are mounted, is that findmnt -A doesn't list them, although it lists explicitely mounted subvolumes.

If the above change of behaviour is not implementable maybe the option should be renamed --no-traverse-submounts-and-volumes and the docs should be updated on this. I will do this and make a pull request if I find approval in this thread.

eike-fokken commented 5 years ago

I also think a warning as suggested by @horihel would be good although I have no idea how to do that.

eike-fokken commented 5 years ago

For your information: I opened this issue with python itself, asking to change behaviour of os.path.ismount for nested subvolumes because of the discrepancy to findmnt.

alajovic commented 4 years ago

I'd like to add my voice to the side claiming that the behavior --one-file-system is surprising when btrfs subvolumes are involved. My use case is something like this:

btrfs filesystem on a single disk, mounted at /
several Linux containers in /var/lib/machines, with each container as a separate subvolume (machinectl automatically creates a btrfs subvolume for each of them)
USB keys / disks may be plugged occasionally

I'd like to do automatic periodic backups of my system. The backups should contain the entire system, except for the contents of any USB keys or external disks that happen to be plugged in. If the root filesystem was, for example, ext4, I could accomplish this in a trivial way with --one-file-system. However, with btrfs, things are more complicated, and there seem to be two options:

Retain --one-file-system and explicitly include all subvolumes. Since the subvolumes may come and go as I add and remove the containers, I can't simply specify a fixed list of paths. Instead, the backup script would have to search for subvolumes and add them to the borg create call. Things get even more complicated if nested subvolumes are present within the containers.
Remove --one-file-system, but then I have effectively an inverse problem: since external USB keys and disks may come and go, the backup script has to sniff around to find the mount points and exclude them. Even then, the results are completely unpredictable if I happen to plug in an external disk while borg is already running.

None of the solutions seem ideal.

Note that the subvolumes are not explicitly mounted (such as with subvol=foo). To me, they simply appear as regular directories on the filesystem. If I did not know about btrfs subvolumes, I would never have guessed that machinectl creates subvolumes automatically, that the directories in /var/lib/machines are not regular directories, and that borg will treat them as external mouts and thus refuse to include them in the backup if --one-file-system is used.

cmurf commented 4 years ago

In the case of containers, restoring such a backup likely won't work because it won't restore the subvolume/snapshot hierarchy. I think for this you need a separate backup regime for containers that's tar based. That's what podman/mobi/docker all expect, and I see machinectl also has export-tar and import-tar commands; and also they're expected to be ephemeral, containing no data themselves.

While Btrfs subvolumes aren't separate file systems, they are a dedicated btrfs, with their own pool of inodes, and statfs() reports them as being separate devices, and unique fd. The borg behavior is also consistent with rsync -x.

alajovic commented 4 years ago

I'm quite confident that restoring from such a backup would work just fine. The subvolumes that machinectl creates are not vital to the operation of containers. Also, unlike docker images, containers executed with machinectl are in general not required or even expected to be ephemeral, they function more like a chrooted system with additional namespacing. So tar-based backup systems would not be entirely appropriate for them – it would definitely be harder to ensure proper deduplication. Especially because they could be backed up by borg just fine.

But anyway, I think that's an entirely separate topic. I was trying to make another point, but got it mixed up with considerations about my ideal backup scheme. Sorry about that. My main point was that the option --one-file-system is likely to result in behavior that is surprising to users of btrfs. They might not even be aware that their system contains subvolumes, since these get automatically created by utilities like machinectl. As far as I know, several GNU/Linux distributions have been considering to switch to btrfs by default (see, e.g. this), so the number of people that will be potentially affected by this surprise is only going to increase.

I understand that subvolumes are technically quite different from ordinary directories. As you said, own pool of inodes, statfs() reports them as separate devices, and so on. I'm not saying that --one-file-system behaves incorrectly. Indeed, it behaves entirely consistently under a certain set of presumptions. The problem is that this set of presumptions is not necessarily obvious to the user. The current description of the option in the documentation is:

-x, --one-file-system stay in the same file system and do not store mount points of other file systems

Traditionally, it was quite obvious what "same file system" meant, and how it was different from "mount points of other file systems". But with newer filesystems, such as btrfs, coming into play, such a description is not so clear anymore. In my opinion, at least a few words in the documentation should be dedicated to this issue. Users need to be warned that subvolumes will not be included when this option is used. The original author of this issue got burned due the misunderstanding, and I nearly did, too.

For those of us that would only like to avoid pulling in files from external mounts, a separate option such as --skip-mount-points could be added. But that's, again, an entirely separate discussion.

cmurf commented 4 years ago

I'm quite confident that restoring from such a backup would work just fine. The subvolumes that machinectl creates are not vital to the operation of containers.

No idea. If it expects they are subvolumes, snapshotting them will fail. Maybe there's a fall back, but then you've lost snapshotting.

My main point was that the option --one-file-system is likely to result in behavior that is surprising to users of btrfs.

It'd be surprising in any case. There isn't a single correct expectation. I guess one way of answering this is, which is the worse outcome? Data that the user thought was backed up, but isn't, because e.g. /home was excluded. Or data that the user thought was backed up once, but isn't, e.g. / was backed up 500 times because there are 500 snapshots of it. Possibly in either case there's data not backed up that was expected to be.

A possible refinement might be to default to backing up all subvolumes that are not snapshots. This information is exposed by the libbtrfsutil C API; and should be possible to expose it by the libbtrfsutil python API if it isn't already. This might result in a better match up to expectations more often; but there might still be edge cases leading to surprises. It's also not terribly discoverable what's going on: why are some subvolumes backed up and others aren't? Oh, because they're subvolume snapshots. Hmm, well what if there is a "received" snapshot but it's a standalone with no other snapshots made from it? Back it up or not?

If we think the safest path is to back it up when in doubt, that's flawed logic because that might result in the destination (the backup) becoming full, preventing other things from being backed up. This sort of problem happens all the time in software design.

Still another possibility is to include mounted subvolumes, but not cross into nested subvolumes (or snapshots) that aren't mounted.

-x, --one-file-system stay in the same file system and do not store mount points of other file systems

Traditionally, it was quite obvious what "same file system" meant, and how it was different from "mount points of other file systems". But with newer filesystems, such as btrfs, coming into play, such a description is not so clear anymore.

True. A subvolume is a dedicated btree, but not a separate file system.

In the Fedora Btrfs by default case, the previous default layout is LVM+ext4 where /home is on a separate ext4 volume. So if the backup source is set to / then the treatment of /home remains the same. The backup doesn't include /home either way if --one-file-system is used.

alajovic commented 4 years ago

My main point was that the option --one-file-system is likely to result in behavior that is surprising to users of btrfs.

It'd be surprising in any case. There isn't a single correct expectation.

Well, that's a fatalistic claim! :) I don't think that things are so bad. Expectations are based on available information, and the main sources of information in this particular case are: the name of the option, and its description in the manual. The name --one-file-system might be misleading at first sight, but that's understandable. I suppose it was inherited from rsync or tar, and those two had it long before the philosophical question of what exactly constitutes a filesystem became relevant, so the name was entirely unambiguous back then. Since the behavior of borg's option is consistent with the behavior of those of rsync and tar, I find the name quite appropriate. There is a another reason why the name is not really problematic: judging just from the name, it's hard to tell with full confidence what the option does. People will look it up in the man page or read --help before using it. And that's where the description needs to be precise enough to correctly feed the expectations.

As a first step, I propose that the description is expanded with something like "also excludes filesystem subvolumes on systems that support them". Wording might be different, I'm just sharing the basic idea. This still requires the users to know what "subvolumes" are, and whether they have them on the system, so it does not fully cater to my example of a naïve user. But at least, they have been pointed in the right direction. In my opinion, that's the game changer. After all, if people use a program's option without fully understanding what it does, they can't expect to fully understand the consequences of using it. If borg's man page happens to be the first place where a user encounters the word "subvolume", so be it. They are free to go read about it and come back more informed.

A clarification of the matter in the FAQ might also be appropriate, just to reaffirm what the option does in conjunction with a btrfs system.

As a second step, I propose that we research the possibility of implementing an option --skip-mount-points. Excluding the content of portable media and network mounts seems to be a common use case, and this would accomplish the goal. The added benefit is that when people notice both --one-file-system and --skip-mount-points in the man page, they will take even more care to find out the distinction between them, so the chances of misunderstandings and unpleasant surprises are reduced even further.

I guess one way of answering this is, which is the worse outcome? Data that the user thought was backed up, but isn't, because e.g. /home was excluded. Or data that the user thought was backed up once, but isn't, e.g. / was backed up 500 times because there are 500 snapshots of it.

I think that people tend to be quite careful to explicitly exclude the data they don't want to have backed up. If they have large snapshots on the system, they will be aware of them, and they will exclude the snapshot path(s) from the backup. At least in my mind, a snapshot is still perceived as a copy, and if I had 500 copies of something large, I would be quite hurried to ensure that they are not burdening my backups.

eike-fokken commented 4 years ago

For all people reading this thread, but especially @alajovic , @aardbol, as you are using btrfs, could you look over pull request #5391 and comment whether you think the change in documentation may help prevent people from stumbling into this problem?

eike-fokken commented 4 years ago

For your convenience: These are the changes in that PR: In the short help at the top: '-x', '--one-file-system': 'stay in the same file system and do not store mount points of other file systems. This might behave different from your expectations, see the docs.'

And in the prose part below:

The -x or --one-file-system option excludes directories, that are mountpoints (and everything in them). It detects mountpoints by comparing the device number from the output of stat of the directory and its parent. Be aware that in Linux there are directories with device number different from their parent, which the kernel does not consider a mountpoint and also the other way around. Examples are bind mounts (possibly same device number, but always a mountpoint) and ALL subvolumes of a btrfs (different device number from parent but not necessarily a mountpoint). Therefore in Linux one should make doubly sure that the backup works as intended especially when using btrfs. This is even more important, if the btrfs layout was created by someone else, e.g. the distribution installer.

alajovic commented 4 years ago

Haha, @eike-fokken, my mail client showed only the beginning of the first line in the preview,

For all people reading this threat, ...

and I was already holding my breath... until I realized that you meant to type thread instead of threat. So thanks for threatening us with a PR. :) I was planning on doing the same myself, but you beat me to it, so hey, less work for me!

I think that your addition to the documentation clarifies the behavior of --one-file-system, and should prevent the kind of misunderstandings that have been woed about in this issue report. There is one statement in the text that I feel could be more precise:

It detects mountpoints by comparing the device number from the output of stat of the directory and its parent.

This only says that the device numbers are compared, but it might be unclear how borg acts on the result of the comparison. I assume that it excludes the directory if its device number is different from the parent's device number. Perhaps we should say "Specifically, it excludes directories for which stat() reports a device number different from the device number of their parent." or something like that.

Examples are bind mounts (possibly same device number, but always a mountpoint) and ALL subvolumes of a btrfs (different device number from parent but not necessarily a mountpoint).

So this means that --one-file-system will potentially descend into bind mounts?

Therefore in Linux one should make doubly sure that the backup works as intended

Is this really specific to Linux? I'm not familiar enough with other kernels to know whether it applies elsewhere. Anyway, this guideline of double checking the contents of the backups should hold universally.

eike-fokken commented 4 years ago

and I was already holding my breath... until I realized that you meant to type thread instead of threat.

Oh, sorry... you're right.

It does descend into bind mounts, if and only if the device numbers are equal, i just checked.

Is this really specific to Linux?

At least the BSDs don't have bind mounts. But of course it is plausible that someone somewhere has written a btrfs driver for some bsd. So maybe I should reword that.

"Specifically, it excludes directories for which stat() reports a device number different from the device number of their parent."

Nice, I'll write that into the PR.

aardbol commented 4 years ago

For all people reading this thread, but especially @alajovic , @aardbol, as you are using btrfs, could you look over pull request #5391 and comment whether you think the change in documentation may help prevent people from stumbling into this problem?

Seems good to me

bugsyb commented 3 years ago

Would be really great if Borg could treat btrfs subvolumes as within one file-system or have other option which would include it and skip only really other file-systems, as per above, subvolumes are still same filesystem (i.e. mkfs.btrfs creates one filesystem and subvolume creation happens within this filesystem).

I get all the points around what ismounted, etc. returns, it's just that btrfs has been created after ismounted, etc. has been invented. Obviously I came here, as when moved to btrfs and for other reason have subvolumes discovered that these are not getting backed up, up to my big surprise as logically these should be included.

Thanks!

debuglevel commented 1 year ago

I nearly stumbled upon this as well, but got skeptical when my backup was only 2 GBs instead of some TBs.

One of the few problems with borg is, that it assumes the user actually knows what he does (which would be nice, but many users a probably just not full time sysadmins but just want to backup their home server) - IMHO borg could warn the user a bit more if he probably does stupid things. Combining one_file_system: true with a btrfs which has sub-volumes might be such a thing.

In my case, I'm using borg on a Synology NAS which uses magical btrfs stuff and I just did not know that btfs sub-volumes are different filesystems (and it took me quite a time to figure that out, because they are not shown as mountpoints in mount opposed to ZFS or docker overlays etc). I just had luck.

Maybe it would also be a nice idea to include a "Check if I am doing the right things" flag in borg. This could output things like

Backupping /stuff/
Backupping /stuff/A
Skipping /stuff/B because it is a different filesystem and you've got one-filesystem enabled
Backupping /stuff/C

borgbackup / borg