g2p / bedup

Btrfs deduplication
http://pypi.python.org/pypi/bedup
GNU General Public License v2.0
322 stars 50 forks source link

Not deduplicating anything at all #64

Closed dusanmsk closed 9 years ago

dusanmsk commented 9 years ago

Hi,

I tried this to dedup my files on a btrfs volume, but didn't succeed. Tried on debian jessie, 3.16.0-4-amd64 and 4.1.0-0.bpo.2-amd64 to be sure ... Bedup cloned as git from master, at 7283da47c3a2ffc01336389dae84f510caa034d9.

Here is what I did to test that: -- create 500 MB of random data in 500 files mkdir -p /d/1; cd /d/1; for i in seq 1 500; do dd if=/dev/urandom of=file_${i} bs=1M count=1; done

-- now pack it inside tar tar cf ../s.tar *; cd ..

-- now extract some copies of the same files (I didn't want to use cp to be sure no --reflink will take place) for i in seq 2 6; do mkdir $i; cd $i; tar xf ../s.tar; cd ..; done

-- see how much is filled btrfs f df / Data, RAID1: total=5.82GiB, used=4.46GiB

bedup dedup / Not scanning /, generation is still 154

-- still used 4.5 GB btrfs f df / Data, RAID1: total=5.82GiB, used=4.48GiB

-- waiting some time, retrying, no change. Rebooting, running bedup again, no change. -- ok, maybe btrfs df lies about used/free space, let's try to write file until filesystem will be full (I have 2x 8GB drives in a virtual machine, btrfs is in raid1 mode) dd if=/dev/zero of=/d/bigfile bs=1M dd: error writing ‘/d/delme’: No space left on device

-- ok, disk is full, lets take a look how much data contains whole directory du -hs /d 6.2G /d

-- yes, disk is really full, no deduplication at all

Am I doing something wrong?

Thx

g2p commented 9 years ago

That's because the default size cutoff is 8MiB, I think (bedup show will show it). You can run bedup dedup --size-cutoff 65536 to catch small files.

dusanmsk commented 9 years ago

Oh, sry, I missed that feature (I tried to bedup -h dedup instead of bedup dedup -h to get the doc). Thx, it's working ok now.