g2p / bedup

Btrfs deduplication
http://pypi.python.org/pypi/bedup
GNU General Public License v2.0
322 stars 50 forks source link

btrfs show estimate is low #47

Closed ers81239 closed 10 years ago

ers81239 commented 10 years ago

Just wanted to pass on that I converted a filesystem to btrfs specifically to try out the deduplication feature. I knew there were a fair amount of duplicates caused by my multiple checkouts of branches of a giant software project.

I had run scan several times, and it reported that there were no changes to the filesystem, so it wasn't going to scan any more.

show reported that at least 8MB would be freed. However, dedup managed to free ~24GB.

I'm aware that my kernel may have some issue that interferes with bedup.

Here is the data, in case it is useful:

tomcat7@meatcar:/data/opengrok$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.4 LTS" tomcat7@meatcar:/data/opengrok$ uname -a Linux meatcar 3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

esmith@meatcar:~/dev/bedup$ sudo python -m bedup show Label: None UUID: f5cfdb58-19ad-4840-993a-2458eb032dac Device: /dev/sde1 Volume 5 As of generation 1095, tracking 1240 inodes of size at least 8388608 Accessible at /data

df before Filesystem 1K-blocks Used Available Use% Mounted on /dev/sde1 292967724 188674128 70485728 73% /data

dedup output 06:11.0 Size group 170/170 (8390605) sampled 853 hashed 838 freed 23661522715 00.00 Committing tracking stateNo handlers could be found for logger "sqlalchemy.pool.SingletonThreadPool" 00.02 Committing tracking state

df after Filesystem 1K-blocks Used Available Use% Mounted on /dev/sde1 292967724 165535232 87075656 66% /data

g2p commented 10 years ago

inodes of size at least 8388608 refers to the minimum size of tracked inodes, because smaller files aren't tracked (see --size-cutoff). bedup doesn't compute an estimate, it just deduplicates on the fly.