g2p / bedup

Btrfs deduplication
http://pypi.python.org/pypi/bedup
GNU General Public License v2.0
322 stars 50 forks source link

parameters detection #16

Closed ytrezq closed 11 years ago

ytrezq commented 11 years ago

It seems that bedup doesn't evaluate if the path are rights

step to reproduce: start bedup with an incorrect path to volume examples: case where the direcory is not a btrfs filesystem:

root@sysresccd /root/bedup % bedup scan /mnt/windows
Traceback (most recent call last):
  File "/usr/bin/bedup", line 9, in <module>
    load_entry_point('bedup==0.0.8', 'console_scripts', 'bedup')()
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 470, in script_main
    sys.exit(main(sys.argv))
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 459, in main
    return args.action(args)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 142, in vol_cmd
    [filt], tt, recurse=True)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/filesystem.py", line 565, in load_vols
    vol = self._get_vol_by_path(volpath, desc=VolDesc(volpath, True))
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/filesystem.py", line 407, in _get_vol_by_path
    return self._get_vol(fd, desc)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/filesystem.py", line 414, in _get_vol
    vol_id = Volume2.vol_id_of_fd(fd)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/filesystem.py", line 328, in vol_id_of_fd
    return get_fsid(fd), get_root_id(fd)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/platform/btrfs.py", line 440, in get_fsid
    ioctl_pybug(volume_fd, lib.BTRFS_IOC_FS_INFO, args_buf)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/platform/btrfs.py", line 369, in ioctl_pybug
    return fcntl.ioctl(fd, ioc, arg, True)
IOError: [Errno 25] Inappropriate ioctl for device

case of a normal directory in a btrfs filesystem:

root@sysresccd /root/bedup % bedup dedup /mnt/gentoo/usr (the mount point is at /mnt/gentoo)
Scanning volume /mnt/gentoo/usr generations from 16170 to 16183, with size cutoff 8388608
00.27 Scanned 57957 retained 2
Deduplicating filesystem <FUNTOO>
00.01 Size group 1/14 sampled 0 hashed 0 freed 0
Traceback (most recent call last):
  File "/usr/bin/bedup", line 9, in <module>
    load_entry_point('bedup==0.0.8', 'console_scripts', 'bedup')()
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 470, in script_main
    sys.exit(main(sys.argv))
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 459, in main
    return args.action(args)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/__main__.py", line 185, in vol_cmd
    dedup_tracked(sess, volset, tt, defrag=args.defrag)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/tracking.py", line 381, in dedup_tracked
    dedup_tracked1(sess, tt, ofile_reserved, query, fs, defrag)
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/tracking.py", line 425, in dedup_tracked1
    with closing(fopenat(inode.vol.live.fd, pathb)) as rfile:
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/platform/openat.py", line 49, in fopenat
    return os.fdopen(openat(base_fd, path, os.O_RDONLY), 'rb')
  File "/usr/lib/python2.7/site-packages/bedup-0.0.8-py2.7-linux-x86_64.egg/bedup/platform/openat.py", line 40, in openat
    raise IOError(ffi.errno, os.strerror(ffi.errno), (base_fd, path))
IOError: [Errno 2] No such file or directory: (6, 'usr/portage/distfiles/jdk-7u10-linux-x64.tar.gz')

expected operations: for the first case, the program should report that the path doesn't lead to a btrfs filesystem in any way.

In the secound case, dedup could detect the root directory (real one or subvolume) and redirect the commande to it. A secound possibilty would be to add a feature that would deduplicate the files only in the directory (copy of files with originals in the directory, located outside of it, wouldn' be deduplicated) wether this a subvolume or not.

g2p commented 11 years ago

The second possibility wouldn't work well in all cases (bedup does scans at the volume level, scanning a single directory might not work well if the directory is very large like /usr). I'll probably deal with these by adding nicer error messages.

ytrezq commented 11 years ago

Thanks, I was also wondering if it could inpact that the volume in the example have the problem in #15.(sorry for the cross-post) But the feature I was talking about would make btrfs scaning at directory level if the path on cmd lead to a directory.

this would be nice if you have a separate temporary directory and you want the backups (1000's) it countains doesn't share the same sectors on disk.