Lakshmipathi / dduper

Fast block-level out-of-band BTRFS deduplication tool.
GNU General Public License v2.0
168 stars 18 forks source link

dump-csum output is empty, so dduper prints "has 0 chunks" for every file #41

Open eisnerd opened 4 years ago

eisnerd commented 4 years ago

Not sure which repo to report/ask this in, sorry.

I've tried the prebuilt btrfs.static and kdave/btrfs-progs.git#v5.6.1 with 0001-Print-csum-for-a-given-file-on-stdout.patch built from source. I'm pretty sure I have CRC32 csums (mount says Btrfs loaded, crc32c=crc32c-intel), but btrfs inspect-internal dump-csum just pauses and exits (code 0) without printing anything. No kernel/syslog messages occur while dump-csum is running. I've tried several files and all three devices in the set.

Any ideas as to how I diagnose this?

Lakshmipathi commented 4 years ago

Can you share the output for last four commands?

mkdir -p /btrfs-mnt-pt/testfiles
dd if=/dev/urandom of=/tmp/f1 bs=1M count=50
cp -v /tmp/f1 /btrfs-mnt-pt/testfiles/f1
cp -v /tmp/f1 /btrfs-mnt-pt/testfiles/f2
sync

btrfs inspect-internal dump-csum /btrfs-mnt-pt/f1 yourdevice_name  > /tmp/c1
btrfs inspect-internal dump-csum /btrfs-mnt-pt/f2 yourdevice_name  > /tmp/c2

# Please share output for below commands

btrfs fi du /btrfs-mnt-pt/testfiles/f1 /btrfs-mnt-pt/testfiles/f2
ls -lh /tmp/c{1,2}
md5sum /tmp/c1 /tmp/c2 
dduper --device yourdevice --files /btrfs-mnt-pt/testfiles/f2  /btrfs-mnt-pt/testfiles/f1 --dry-run
Lakshmipathi commented 4 years ago

Are you using RAID setup? can share some info regarding your BTRFS setup?

eisnerd commented 4 years ago

That's interesting. dump-csum paused for a long time, but it produced output on those files. One difference may be that they're in the root subvolume, or whatever it's called, whereas other files I've tried are in another subvolume or snapshot.

The filesystem is three whole sata disks (8TiB, 8TiB, 6TiB); RAID1 Data, System and Metadata; no hardware RAID, mdadm or anything like that, just btrfs.

root@ravestar:~# btrfs fi du /btrfs-mnt-pt/testfiles/f1 /btrfs-mnt-pt/testfiles/f2
     Total   Exclusive  Set shared  Filename
  50.00MiB    50.00MiB       0.00B  /btrfs-mnt-pt/testfiles/f1
  50.00MiB    50.00MiB       0.00B  /btrfs-mnt-pt/testfiles/f2
root@ravestar:~# ls -lh /tmp/c{1,2}
-rw-r--r-- 1 root root 114K Sep 13 09:27 /tmp/c1
-rw-r--r-- 1 root root 114K Sep 13 09:28 /tmp/c2
root@ravestar:~# md5sum /tmp/c1 /tmp/c2
55293b62d58ff84907d557d7479d3baf  /tmp/c1
55293b62d58ff84907d557d7479d3baf  /tmp/c2
root@ravestar:~# dduper --device /dev/sda --files /btrfs-mnt-pt/testfiles/f2  /btrfs-mnt-pt/testfiles/f1 --dry-run
Perfect match :  /btrfs-mnt-pt/testfiles/f2 /btrfs-mnt-pt/testfiles/f1
Summary
blk_size : 4KB  chunksize : 16384KB
/btrfs-mnt-pt/testfiles/f2 has 4 chunks
/btrfs-mnt-pt/testfiles/f1 has 4 chunks
Matched chunks: 4
Unmatched chunks: 0
Total size(KB) available for dedupe: 65536
dduper took 117.84783290396444 seconds
Lakshmipathi commented 4 years ago

Thanks for the details.

One difference may be that they're in the root subvolume, or whatever it's called, whereas other files I've tried are in another subvolume or snapshot.

I think that's exactly the issue. If I'm not wrong this also related to https://github.com/Lakshmipathi/dduper/issues/35#issuecomment-688051056

Too bad, dduper hitting lot of issues with sub-volume. Let me work on this bug and update.

eisnerd commented 4 years ago

Ah, thanks. I didn't see that issue. Here's the result of the same commands, but with /btrfs-mnt-pt pointing to a subvolume. This is not a read-only snapshot, just a regular btrfs subvolume create. I do have quite a few subvolumes, if that might make any difference. 97 currently, and probably a few hundred backup snapshots have been made and deleted.

root@ravestar:~# btrfs fi du /btrfs-mnt-pt/testfiles/f1 /btrfs-mnt-pt/testfiles/f2
     Total   Exclusive  Set shared  Filename
  50.00MiB    50.00MiB       0.00B  /btrfs-mnt-pt/testfiles/f1
  50.00MiB    50.00MiB       0.00B  /btrfs-mnt-pt/testfiles/f2
root@ravestar:~# ls -lh /tmp/c{1,2}
-rw-r--r-- 1 root root 0 Sep 13 09:43 /tmp/c1
-rw-r--r-- 1 root root 0 Sep 13 10:14 /tmp/c2
root@ravestar:~# md5sum /tmp/c1 /tmp/c2
d41d8cd98f00b204e9800998ecf8427e  /tmp/c1
d41d8cd98f00b204e9800998ecf8427e  /tmp/c2
root@ravestar:~# dduper --device /dev/sda --files /btrfs-mnt-pt/testfiles/f2  /btrfs-mnt-pt/testfiles/f1 --dry-run
Perfect match :  /btrfs-mnt-pt/testfiles/f2 /btrfs-mnt-pt/testfiles/f1
Summary
blk_size : 4KB  chunksize : 16384KB
/btrfs-mnt-pt/testfiles/f2 has 0 chunks
/btrfs-mnt-pt/testfiles/f1 has 0 chunks
Matched chunks: 0
Unmatched chunks: 0
Total size(KB) available for dedupe: 0
dduper took 105.05558980401838 seconds
eisnerd commented 4 years ago

I just saw on #39, too, a mention of switching from single to RAID1. I had to recover from a disk failure that caused a weird fs problem by creating a new btrfs on a new disk, copying from a recovery mounted set of disks, then adding the old still good disks back in. Anyway, the point is that it was "single data, dup metadata" for a day, then rebalanced to all RAID1. The subvolume tested above was, however, created after switching to RAID1.

Lakshmipathi commented 4 years ago

/btrfs-mnt-pt/testfiles/f2 has 0 chunks /btrfs-mnt-pt/testfiles/f1 has 0 chunks

That's the issue with sub-volume. btrfs inspect-internal dump-csum won't print any data. That is creating this issue. I need investigate and possibly re-write btrfs inspect-internal dump-csum option. That will take some effort from my end.

The subvolume tested above was, however, created after switching to RAID1.

Thanks for more details. I suspect fixing subvolume issue should resolve any profile changed from "single" to RAID1. This is becoming major bug with dduper, I hope to resolve sub-volume issue sometime this week.

Lakshmipathi commented 4 years ago

Quick update on subvolume: I spent 3 or 4 days trying to figure out issue with sub-volume, I can dump csum of subvolume from different code path. Still need some work to explore the btrfs disk-layout.