Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
691 stars 56 forks source link

*** EXCEPTION *** No space left on device btrfs-extent-same: BtrfsExtentSame - how to resolve? #292

Closed jamesfreeman959 closed 1 month ago

jamesfreeman959 commented 1 month ago

Hi there

I have been having great success using bees on btrfs on Ubuntu 24.04.1 - I built beesd from source from the v0.10 commit tag.

My backup volume (shared over NFSv3) suddenly stopped working with No space left on device errors. However at first glance, it has free space:

Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       512G  426G   86G  84% /mnt/fmbackup

A more detailed query of btrfs usage shows:

$ sudo btrfs filesystem usage /mnt/fmbackup 
Overall:
    Device size:                 512.00GiB
    Device allocated:            512.00GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        425.26GiB
    Free (estimated):             85.72GiB      (min: 85.72GiB)
    Free (statfs, df):            85.72GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 96.00KiB)
    Multiple profiles:                  no

Data,single: Size:500.01GiB, Used:414.29GiB (82.86%)
   /dev/vdb1     500.01GiB

Metadata,DUP: Size:5.99GiB, Used:5.49GiB (91.64%)
   /dev/vdb1      11.97GiB

System,DUP: Size:8.00MiB, Used:80.00KiB (0.98%)
   /dev/vdb1      16.00MiB

Unallocated:
   /dev/vdb1       1.00MiB

I can see metadata is approaching a problem level, but it's not 100% full yet. Looking in syslog at beesd entries, I see repeated entries like this:

2024-10-08T20:12:05.218065+03:00 backup1 beesd[818]: crawl_5_23008204[961]: *** EXCEPTION ***
2024-10-08T20:12:05.218087+03:00 backup1 beesd[818]: crawl_5_23008204[961]: #011exception type std::system_error: btrfs-extent-same: BtrfsExtentSame { .m_fd = 9 '/run/bees/mnt/a27cd0ad-fab2-4ead-b88b-652328d8a796/anotherfile.dat', .logical_offset = 0xc99a0000, .length = 0x20000, .info[] = { [0] = btrfs_ioctl_same_extent_info { .fd = 6 '/run/bees/mnt/a27cd0ad-fab2-4ead-b88b-652328d8a796/somefile.dat', .logical_offset = 0xc99a0000, .bytes_deduped = 0x0, .status = -28 (No space left on device), .reserved = 0 }, } at fs.cc:177: No space left on device
2024-10-08T20:12:05.218184+03:00 backup1 beesd[818]: crawl_5_23008204[961]: ***

I don't think this is a beesd problem, but I don't understand what's going wrong (i.e. why I'm getting no space left errors when I can see space). I have the capability to increase the size of the disk - this is running on a VM so I can easily just extend the disk. However assuming I do this, what would be the right steps to resize the filesystem and metadata and fix the issue?

Also can anyone help me understand what I'm missing - is there something I should be looking at to see what it is that is full?

Many thanks

James

jamesfreeman959 commented 1 month ago

Quick update. I noticed the error message referenced the /run filesystem and this was only 400Mb in size. I increased the RAM on the host and increased /run to 2Gb but the error remains. A later error references trying to write just 131072 Bytes of data so I feel like there's another issue being presented as an out of space error.

Zygo commented 1 month ago

Metadata free space is extremely low - there's less than 512MB of allocated space left, and there is no unallocated space remaining to allocate more metadata.

When metadata free space is less than reserved space, the filesystem is effectively full. The space is reserved so that there will be enough free space to add or resize devices, or delete files or snapshots. If the global reserve space is ever completely filled, the filesystem will become permanently read-only. To prevent that, btrfs will abort transactions that fill the global reserve.

If you can add a bit more space, you can balance data block groups (btrfs balance start -dusage=75 or btrfs-balance-least-used -u 75) to reclaim used space for more metadata. You can run that command periodically to maintain availability of metadata space, or you can set up a script to run echo 75 | tee /sys/fs/btrfs/*/allocation/data/bg_reclaim_threshold to have the kernel do it automatically as needed.

jamesfreeman959 commented 1 month ago

Thanks so much - really appreciate your detailed feedback. I did a bit more homework as I clearly need to learn more about day to day management of BTRFS. Expanding the volume, and then balancing it fixed the issue perfectly. Appreciate your help!