Meta: create "loss of data" label

Sure, if it ever comes up. As far as I know, bees has never been the root cause of any data loss event. Do you have one to report?

btrfs has occasionally had kernel releases with data-losing bugs. Running any software which modifies the filesystem on such a kernel can cause data loss. Some of these bugs are documented on the kernel bugs table but they can affect many applications, not just bees. For example, one bug on the bugs list carries a small but non-zero risk of total filesystem data loss for every write operation involving btrfs--it's the one with the big data corruption warnings at the top of the page.

It would be difficult to maintain the accuracy and integrity of the label on a bees github issue. Even when there is a kernel bug that is triggered by one of the fixed set of operations that bees does (tree search, extent backref lookup, inode name lookup, file open, file stat, data read, data write, data write with compression, and deduplicate), those operations are fundamental to what bees does. The data loss risk assessment would apply only to the combination of bees with specific kernel versions, and no change in bees would add or remove the data loss risk from the combination of bees with a bad kernel version. Only changes to the kernel, not bees, can fix a kernel bug.

I don't think it's feasible or reasonable for the bees project to take on the responsibility of tracking all data-losing kernel bugs in btrfs--and especially not if the scope is expanded to cover related subsystems like sata disks or lvm which btrfs may depend on for data integrity. I make a nominal effort to test all mainline kernel releases with current bees versions (mostly to protect data directly in my care), and I update the published pages when I find new issues. I'm willing to republish bugs others have identified if I can confirm the issue. I'm never going to be a replacement for proper and timely kernel QA.

bees has some low-level knowledge of btrfs filesystem structure, but it only ever reads these structures. It is up to the kernel to perform all modifications recommended by bees--and the kernel can (and often does) reject these recommendations if they would alter data. There are a number of risk mitigation design features as well:

user data files are always opened O_RDONLY so that a bug causing a write to a wrong FD in the bees process will fail instead of overwriting user data
file stat is verified after opening to ensure the intended file was opened before performing any further operations
use of O_NOFOLLOW and O_TMPFILE to further reduce symlink and TOCTTOU attack surface (bees is a bit behind here, since kernel 5.6 there is openat2 which has stronger cross-device and symlink controls)

Zygo / bees

Meta: create "loss of data" label #295