g2p / bedup

Btrfs deduplication
http://pypi.python.org/pypi/bedup
GNU General Public License v2.0
322 stars 50 forks source link

RuntimeError: maximum recursion depth exceeded #29

Closed ghost closed 9 years ago

ghost commented 11 years ago

On Ubuntu 13.10 with 1 week old btrfs raid10 w/ lzo compression on 4x2TB w/ ~2TB data and ~10 snapshots. Was curious to see what the saving of deduplication would be so cloned master and I ran:

sudo ./dedup /data

for about 24hrs and reclaimed ~80GB before getting this error (omitting duplicate lines, sorry don't have the top of the trace):

File "~/.local/lib/python2.7/site-packages/contextlib2.py", line 244, in _invoke_next_callback suppress_exc = _invoke_next_callback(exc_details) File "~/.local/lib/python2.7/site-packages/contextlib2.py", line 244, in _invoke_next_callback suppress_exc = _invoke_next_callback(exc_details) File "~/.local/lib/python2.7/site-packages/contextlib2.py", line 246, in _invoke_next_callback suppress_exc = cb(sys.exc_info()) File "~/.local/lib/python2.7/site-packages/contextlib2.py", line 171, in _exit_wrapper return cm_exit(cm, exc_details) RuntimeError: maximum recursion depth exceeded

My curiosity is satiated for now, but let me know if you could use any more details or would like me to try and reproduce the error again.

g2p commented 11 years ago

Could be an error in the ExitStack backport I use on Python 2.7. Let me know if it also happens on Python 3.

ghost commented 11 years ago

rebuilt with python3.3 that is included in ubuntu. Build worked but couldn't run because of issue 27.

~/dev/bedup$ sudo python3.3 ~/.local/bin/bedup dedup /data
Traceback (most recent call last):
  File "/home/weirdtalk/.local/bin/bedup", line 9, in <module>
    load_entry_point('bedup==0.9.0', 'console_scripts', 'bedup')()
  File "/home/weirdtalk/.local/lib/python3.3/site-packages/bedup-0.9.0-py3.3-linux-x86_64.egg/bedup/__main__.py", line 487, in script_main
    sys.exit(main(sys.argv))
  File "/home/weirdtalk/.local/lib/python3.3/site-packages/bedup-0.9.0-py3.3-linux-x86_64.egg/bedup/__main__.py", line 476, in main
    return args.action(args)
  File "/home/weirdtalk/.local/lib/python3.3/site-packages/bedup-0.9.0-py3.3-linux-x86_64.egg/bedup/__main__.py", line 147, in vol_cmd
    [volpath], tt, recurse=True)
  File "/home/weirdtalk/.local/lib/python3.3/site-packages/bedup-0.9.0-py3.3-linux-x86_64.egg/bedup/filesystem.py", line 590, in load_vols
    lo, sta = vol._fs._load_visible_vols([volpath], nest_desc=True)
  File "/home/weirdtalk/.local/lib/python3.3/site-packages/bedup-0.9.0-py3.3-linux-x86_64.egg/bedup/filesystem.py", line 274, in _load_visible_vols
    os.path.join(start_desc.description, relpath),
  File "/usr/lib/python3.3/posixpath.py", line 92, in join
    "components.") from None
TypeError: Can't mix strings and bytes in path components.
mgorny commented 10 years ago

I'm having a similar error with Python 3.4:

  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
  File "/usr/lib64/python3.4/site-packages/contextlib2.py", line 244, in _invoke_next_callback
    suppress_exc = _invoke_next_callback(exc_details)
RuntimeError: maximum recursion depth exceeded
mgorny commented 9 years ago

Ping. Any chance to fix this? Otherwise it's impossible to use bedup when you have too many files of the same size (say, PostgreSQL installed).

sulaweyo commented 9 years ago

Jep same here on deduplicating some SVN workspaces :(

lpirl commented 9 years ago

+1

mac-linux-free commented 9 years ago

same to me .... max recursion depth exeeded ... on debian and ubuntu 14.04 .... what could we do ?

lpirl commented 9 years ago

I contacted the developer of contextlib2. He is aware of the problem, there is a fix but he haven't found the time to pack a new release so far.

mgorny commented 9 years ago

Do you mean https://bitbucket.org/ncoghlan/contextlib2/commits/170d5144455767dc39065f804d18df2104df1b0c? Sadly, that seems to be only possibly relevant commit after the last release, and it's around 1.5yr since it was committed. contextlib2 seems pretty much dead to me.

mac-linux-free commented 9 years ago

so bedup development is dead? where is the light at the end...?

sulaweyo commented 9 years ago

I switched over to duperemove. It's not perfect on larger sets but you can run it on folders so i just do smaller chunks. At least i never had immutable files left over or OOM issues as i had with bedup again and again.

mac-linux-free commented 9 years ago

good, that is what I do now. hope it work for my 80tb store.

sulaweyo commented 9 years ago

I am deduplicating source code branches so i use a blocksize of 8K while duperemove usually uses 128K. For really large storage increase that maybe. With 8K even my 200GB deduped to 20GB generate way above 2GB of hashes and duperemove uses memory accordingly.

mac-linux-free commented 9 years ago

where are the hashes stored? only in memory? if the hashes are generated one time are they used again with delta support? i checked out duperemove and it is version 0.10-dev...i think i remember some oom´s with duperemove 0.08 on my 80tb store

sulaweyo commented 9 years ago

I did not check the code but it looks like the hashes are stored in memory. You can write them to a file but still they will be loaded to memory again as a whole. At least that is what it looks like. From the issues there it looks like they want to move that to sqllite which could help to drop the memory consumption a lot i guess. Guess it's better to directly ask there.

mac-linux-free commented 9 years ago

ok i tested now ... my 80tb store consumed 128gb ram with duperemove read only ... my server has 256gb ram so no problem for me :) the really great thing about duperemove is the multithreaded hashing so with 24cores and 256gb ram it works.

g2p commented 9 years ago

bedup is now Python3-only (way overdue), and doesn't use contextlib2 anymore.