Open broetchenrackete36 opened 4 years ago
@broetchenrackete36 thanks. Could you please try below steps and tell the results?
Lets first check whether dump-csum
option working properly. If this fails then dduper won't work.
btrfs inspect-internal dump-csum /btrfs/subvol/ddtest/sbd.img /dev/sda1 &> /tmp/subvol_csum1
btrfs inspect-internal dump-csum /btrfs/subvol/ddtest/sbd.img2 /dev/sda1 &> /tmp/subvol_csum2
btrfs inspect-internal dump-csum /btrfs/ddtest/sbd.img /dev/sda1 &> /tmp/root_csum1
btrfs inspect-internal dump-csum /btrfs/ddtest/sbd.img2 /dev/sda1 &> /tmp/root_csum1
Please confirm output files are non-empty and check its md5sum are same.
md5sum /tmp/subvol_csum{1,2}
md5sum /tmp/root_csum{1,2}
If this worked, then there is issue with the python script which should be easier to solve. If we have failure on dump-csum
then I need to re-create your set-up and examine what’s going on.
Also I'm not sure why the total size deduped is 0 on the actual dedupe...
Before you try above steps https://github.com/Lakshmipathi/dduper/issues/8#issuecomment-664772029 , can you get the latest dduper
file and check again on your environment?. Its a one-line fix for total size deduped is 0
. Actually dduper removed duplicate data but prints out wrong info, now it should report correct values.
diff --git a/dduper b/dduper
index 20dbde7..8bde512 100755
--- a/dduper
+++ b/dduper
@@ -276,6 +276,7 @@ def display_summary(blk_size, chunk_sz, perfect_match_chunk_sz, src_file,
global dst_file_sz
if perfect_match == 1:
chunk = perfect_match_chunk_sz
+ total_bytes_deduped = dst_file_sz
else:
chunk = chunk_sz
Thanks for the response. I applied the fix but I still get 0 for total deduped size.
I also ran the dump-csum on the files in the subvolume and root volume. It produces nothing (empty file) on the subvolume and works fine on the root volume...
Thanks for the response. I applied the fix but I still get 0 for total deduped size.
That's strange. If you run sudo python2 ./dduper --device /dev/sda1 --dir /btrfs/ddtest/
and check disk usage with sync && df
does it show any new free space or it remains the same?
It produces nothing (empty file) on the subvolume and works fine on the root volume
I haven't really tested the tool with subvolume. but I think it should work with root volume since it reports csum from it.
I am using blake2 as csum on a 6-drive raid5 data raid1 meta array.
How easy or hard to re-create your setup, can you share sample RAID commands or script ? I can launch cloud vm with required devices and check.
I created the array like this:
sudo mkfs.btrfs -d raid5 -m raid1 -L BlueButter -f /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --csum blake2
And then mounted like this:
sudo mount -t btrfs -o clear_cache,space_cache=v2,noatime /dev/sda1 /btrfs/
And then simply created a new subvolume:
sudo btrfs subv create /btrfs/subvol
I checked if dduper is freeing space and it doesn't seem so when looking at df output. I even cp'd one of the file to have two exact same files and df didn't show a difference in available space... This could be related to raid5 though, df with raid5 is not really reliable...
thanks for the details. Let me check whether dduper can support raid setup.
update: I tried above setup it gave me different errors:
bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
unable to open /dev/sda
bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
unable to open /dev/sda
Perfect match : /mnt/f1 /mnt/f2
Summary
blk_size : 4KB chunksize : 8192KB
/mnt/f1 has 1 chunks
/mnt/f2 has 1 chunks
Matched chunks: 1
Unmatched chunks: 0
Total size(KB) available for dedupe: 8192
dduper took 1.42327594757 seconds
If I'm not wrong, I was able to reproduce the issue with below command and suspect it may be related --csum blake2
. Below command worked with default crc32.
mkfs.btrfs -m raid1 /dev/sda /dev/sdb -f --csum blake2
Need to examine further.
The issue is related to blake2 csum. I don't know exactly why blake2 csum fetched for files with same content differs. Here is a simple way to reproduce the issue:
mkfs.btrfs /dev/sda --csum blake2
now mount and run
cp /tmp/a /mnt/f{1,2}
btrfs inspect-internal dump-csum /mnt/f1 /dev/sda &> /tmp/f1.csum
btrfs inspect-internal dump-csum /mnt/f2 /dev/sda &> /tmp/f2.csum
While using default crc32, contents of /tmp/f1.csum and /tmp/f2.csum will match. But in this case, csum file differ. I plan to explore this blake2 soon, until then I'll add limitation that dduper won't support --csum blake2
I added fix for new checksum types like xxhash64,blake2,sha256 https://github.com/Lakshmipathi/dduper/pull/42 . And tested locally. If you installed dduper via source, you can try git pull
and try it.
I need to fix issues related to sub-volume.
Release version dduper v0.04
with new checksum support. It should available via all installation methods.
Running dduper on a subvolume doesn't seem to work. Both directories have the same two files. Both files are canceled dd copies of my boot drive.
Output from subvolume:
Output from rootvolume:
Also I'm not sure why the total size deduped is 0 on the actual dedupe...
I am using blake2 as csum on a 6-drive raid5 data raid1 meta array.