alecthomas / importmagic

A Python library for finding unresolved symbols in Python code, and the corresponding imports
BSD 2-Clause "Simplified" License
120 stars 21 forks source link

Segfault while indexing libtorrent #34

Open whirm opened 8 years ago

whirm commented 8 years ago

Some info from the debugger:

(gdb) bt
#0  __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:29
#1  0x00007f1a806f9d7b in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0
#2  0x00007f1a806fa39d in boost::python::converter::registry::insert(void* (*)(_object*), void (*)(_object*, boost::python::converter::rvalue_from_python_stage1_data*), boost::python::type_info, _typeobject const* (*)()) () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0
#3  0x00007f1a8070daf1 in boost::python::converter::initialize_builtin_converters() () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0
#4  0x00007f1a806f9e14 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0
#5  0x00007f1a806ee2b4 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0
#6  0x0000003ae140f2da in call_init (l=<optimized out>, argc=argc@entry=5, argv=argv@entry=0x7ffc8f270258, env=env@entry=0x7f1ab029b470) at dl-init.c:72
#7  0x0000003ae140f3eb in call_init (env=0x7f1ab029b470, argv=0x7ffc8f270258, argc=5, l=<optimized out>) at dl-init.c:30
#8  _dl_init (main_map=main_map@entry=0x7f1ab323c500, argc=5, argv=0x7ffc8f270258, env=0x7f1ab029b470) at dl-init.c:120
#9  0x0000003ae14138d0 in dl_open_worker (a=a@entry=0x7f1ab78c2818) at dl-open.c:575
#10 0x0000003ae140f184 in _dl_catch_error (objname=objname@entry=0x7f1ab78c2808, errstring=errstring@entry=0x7f1ab78c2810, mallocedp=mallocedp@entry=0x7f1ab78c2807, operate=operate@entry=0x3ae1413500 <dl_open_worker>, args=args@entry=0x7f1ab78c2818) at dl-error.c:187
#11 0x0000003ae1413081 in _dl_open (file=0x7f1ab3249100 "/usr/lib/python2.7/dist-packages/libtorrent.so", mode=-2147483646, caller_dlopen=0x522fd3 <_PyImport_GetDynLoadFunc+243>, nsid=-2, argc=5, argv=<optimized out>, env=0x7f1ab029b470) at dl-open.c:660
#12 0x0000003ae1c00f09 in dlopen_doit (a=a@entry=0x7f1ab78c2a30) at dlopen.c:66
#13 0x0000003ae140f184 in _dl_catch_error (objname=0x7f1ab01d8d00, errstring=0x7f1ab01d8d08, mallocedp=0x7f1ab01d8cf8, operate=0x3ae1c00eb0 <dlopen_doit>, args=0x7f1ab78c2a30) at dl-error.c:187
#14 0x0000003ae1c01521 in _dlerror_run (operate=operate@entry=0x3ae1c00eb0 <dlopen_doit>, args=args@entry=0x7f1ab78c2a30) at dlerror.c:163
#15 0x0000003ae1c00fa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#16 0x0000000000522fd3 in _PyImport_GetDynLoadFunc () at ../Python/dynload_shlib.c:140
#17 0x0000000000522baf in _PyImport_LoadDynamicModule () at ../Python/importdl.c:42
#18 0x00000000004af535 in import_submodule.lto_priv.1579 (fullname=0x7f1ab2cc6d20 "libtorrent", subname=0x7f1ab2cc6d20 "libtorrent", mod=<optimized out>) at ../Python/import.c:2722
#19 load_next (p_buflen=<synthetic pointer>, buf=0x7f1ab2cc6d20 "libtorrent", p_name=<synthetic pointer>, altmod=<optimized out>, mod=<optimized out>) at ../Python/import.c:2537
#20 import_module_level.isra.3 (level=<optimized out>, fromlist=['.'], globals=<optimized out>, name=<optimized out>) at ../Python/import.c:2246
#21 PyImport_ImportModuleLevel () at ../Python/import.c:2310
#22 0x00000000004b1408 in builtin___import__ () at ../Python/bltinmodule.c:49
#23 0x00000000004cb0a5 in do_call (nk=<optimized out>, na=1, pp_stack=0x7f1ab78c2f30, func=<built-in function __import__>) at ../Python/ceval.c:4564
#24 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c2f30) at ../Python/ceval.c:4372
#25 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#26 0x00000000004c2bd5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#27 0x00000000004cacac in fast_function (nk=1, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c3140, func=<function at remote 0x7f1ab8536cf8>) at ../Python/ceval.c:4445
#28 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c3140) at ../Python/ceval.c:4370
#29 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#30 0x00000000004ca1af in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c3290, func=<function at remote 0x7f1ab8536c80>) at ../Python/ceval.c:4435
#31 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c3290) at ../Python/ceval.c:4370
#32 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#33 0x00000000004ca1af in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c33e0, func=<function at remote 0x7f1ab8536b90>) at ../Python/ceval.c:4435
#34 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c33e0) at ../Python/ceval.c:4370
#35 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#36 0x00000000004ca1af in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c3530, func=<function at remote 0x7f1ab8536d70>) at ../Python/ceval.c:4435
#37 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c3530) at ../Python/ceval.c:4370
#38 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#39 0x00000000004c2bd5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#40 0x00000000004def28 in function_call.lto_priv () at ../Objects/funcobject.c:523
#41 0x00000000004b1143 in PyObject_Call () at ../Objects/abstract.c:2546
#42 0x00000000004c715f in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7f1ab78c37e8, func=<function at remote 0x7f1ab85450c8>) at ../Python/ceval.c:4664
#43 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#44 0x00000000004ca1af in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c3930, func=<function at remote 0x7f1ab855a668>) at ../Python/ceval.c:4435
#45 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c3930) at ../Python/ceval.c:4370
#46 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#47 0x00000000004ca1af in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f1ab78c3a80, func=<function at remote 0x7f1ab855a7d0>) at ../Python/ceval.c:4435
#48 call_function (oparg=<optimized out>, pp_stack=0x7f1ab78c3a80) at ../Python/ceval.c:4370
#49 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#50 0x00000000004c2bd5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#51 0x00000000004ded6e in function_call.lto_priv () at ../Objects/funcobject.c:523
#52 0x00000000004b1143 in PyObject_Call () at ../Objects/abstract.c:2546
#53 0x00000000004f4d7e in instancemethod_call.lto_priv () at ../Objects/classobject.c:2602
#54 0x00000000004b1143 in PyObject_Call () at ../Objects/abstract.c:2546
#55 0x00000000004ce9e0 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4219
#56 0x0000000000595722 in t_bootstrap () at ../Modules/threadmodule.c:620
#57 0x0000003ae2007454 in start_thread (arg=0x7f1ab78c4700) at pthread_create.c:334
#58 0x0000003ae18e8eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) py-bt
Traceback (most recent call first):
  File "/home/whirm/.local/lib/python2.7/site-packages/importmagic/index.py", line 188, in index_builtin
    module = __import__(name, fromlist=['.'])
  File "/home/whirm/.local/lib/python2.7/site-packages/importmagic/index.py", line 180, in _index_module
    self.index_builtin(import_path, location=location)
  File "/home/whirm/.local/lib/python2.7/site-packages/importmagic/index.py", line 159, in index_path
    self._index_module(root, location)
  File "/home/whirm/.local/lib/python2.7/site-packages/importmagic/index.py", line 207, in build_index
    self.index_path(filename)
  File "/home/whirm/.emacs.d/var/el-get/elpy/elpy/impmagic.py", line 44, in _build_symbol_index
    index.build_index([project_root] + sys.path)
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
(gdb) py-print name
local 'name' = 'libtorrent'
(gdb) py-locals
self = <SymbolIndex(_exports={}, _lib_locations=[('/usr/local/lib/python2.7/dist-packages', '3'), ('/usr/lib/python2.7/dist-packages', '3'), ('/usr/local/lib/python2.7', 'S'), ('/usr/lib/python2.7', 'S')], _name=None, _parent=None, _blacklist_re=<_sre.SRE_Pattern at remote 0x7f1ab8542030>, score=<float at remote 0x1b71600>, location='L', _tree={'filecmp': <SymbolIndex(_exports={'cmpfiles': <float at remote 0x1b714b0>, 'dircmp': <float at remote 0x1b714b0>, 'cmp': <float at remote 0x1b714b0>}, _lib_locations=None, _name='filecmp', _parent=<...>, _blacklist_re=<_sre.SRE_Pattern at remote 0x7f1ab8542030>, score=<float at remote 0x1b71600>, location='S', _tree={'dircmp': <float at remote 0x1b714b0>, 'cmpfiles': <float at remote 0x1b714b0>, 'cmp': <float at remote 0x1b714b0>}) at remote 0x7f1ab6b17650>, 'Canvas': <SymbolIndex(_exports={}, _lib_locations=None, _name='Canvas', _parent=<...>, _blacklist_re=<_sre.SRE_Pattern at remote 0x7f1ab8542030>, score=<float at remote 0x1b71600>, location='S', _tree={'Canvas': <float a...(truncated)
name = 'libtorrent'
location = 'S'
basename = 'libtorrent'
(gdb) py-list
 183            basename = name.rsplit('.', 1)[-1]
 184            if basename.startswith('_'):
 185                return
 186            logger.debug('importing builtin module %s for indexing', name)
 187            try:
>188                module = __import__(name, fromlist=['.'])
 189            except Exception:
 190                logger.debug('failed to index builtin module %s', name)
 191                return
 192    
 193            with self.enter(basename, location=location) as subtree:
(gdb) 

Note that running module = __import__('libtorrent', fromlist=['.']) from ipython works just fine.

(I originally created the issue at https://github.com/jorgenschaefer/elpy/issues/834)

Let me know if you need any more info.

If you don't think that can be fixed on the importmagic side, would you accept a PR that adds a module blacklisting feature?

Thanks!

alecthomas commented 8 years ago

As discussed in that issue, there's basically no way this is an issue with ImportMagic itself. All it does is import modules. My guess is that one of the C modules is triggering this by corrupting memory somewhere, and that manifests when importing libtorrent. The issue may or may not be with libtorrent itself.

It does suck though, no denying that. I'd be fine with a black-list feature for working around problems like this.

birkenfeld commented 8 years ago

A blacklist would still work around that. Another possibility would be indexing C modules in a subprocess; this would avoid both crashing completely, and introducing incompatibilities between multiple C modules, which can cause such crashes in the first place.

Importing modules in separate interpreters is necessary in other cases as well, e.g. you can't import both PyQt4 and PyQt5 - no crash in this case, but a RuntimeError.

alecthomas commented 8 years ago

Yeah I agree re. blacklist (my last sentence stated that). Indexing in a subprocess isn't a bad idea either, maybe that's a better option.

birkenfeld commented 8 years ago

Yeah I agree re. blacklist (my last sentence stated that).

Sorry, I overlooked that.

whirm commented 8 years ago

Having several subprocesses indexing modules in parallel would also speed up the process. I guess it would also set the base for index caching as a serialization system would be needed to transfer the collected data back from the subprocesses.