GrahamDumpleton / wrapt

A Python module for decorators, wrappers and monkey patching.
BSD 2-Clause "Simplified" License
2.06k stars 231 forks source link

segfault during our app test during GC with wrapt in the traceback #98

Closed yarikoptic closed 3 years ago

yarikoptic commented 7 years ago

Not 100% sure that it is wrapt to blame, but we only recently decided to check out wrapt as a part of bigger refactoring PR. Unfortunately our tests started to segfault on some test setups (some buildbot docker environments, but not the other, travis). With wrapt being one of the few libraries we use which has a binary extension, and having not seen similar crash before, and having wrapt's functions within the stack at crash time it is likely to assume that the reason is somewhere within wrapt. Here is an extract from the 'gdb bt' and more detail (valgrind output etc) could be found at https://github.com/datalad/datalad/issues/1382

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007ffff6f2d40a in __GI_abort () at abort.c:89
#2  0x00007ffff6f24e47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5555557fae87 "gc->gc.gc_refs != 0", file=file@entry=0x5555557fae40 "../Modules/gcmodule.c", line=line@entry=351, 
    function=function@entry=0x5555557fb2d8 <__PRETTY_FUNCTION__.9755> "update_refs") at assert.c:92
#3  0x00007ffff6f24ef2 in __GI___assert_fail (assertion=0x5555557fae87 "gc->gc.gc_refs != 0", file=0x5555557fae40 "../Modules/gcmodule.c", line=351, function=0x5555557fb2d8 <__PRETTY_FUNCTION__.9755> "update_refs")
    at assert.c:101
#4  0x0000555555712c3b in update_refs (containers=0x555555a7cd20 <generations+96>) at ../Modules/gcmodule.c:351
#5  0x0000555555713e48 in collect (generation=2) at ../Modules/gcmodule.c:924
#6  0x000055555571432b in gc_collect (self=0x0, args=(), kws=0x0) at ../Modules/gcmodule.c:1121
#7  0x00005555556187a8 in PyCFunction_Call (func=<built-in function collect>, arg=(), kw=0x0) at ../Objects/methodobject.c:85
...
#29 0x000055555561f0bb in _Py_Dealloc (
    op=<Dataset(_cfg=None, _repo=<GitRepo(realpath=<unknown at remote 0x7ffff354d9d0>, _GIT_COMMON_OPTIONS=['-c', 'receive.autogc=0', '-c', 'gc.auto=0'], cmd_call_wrapper=<unknown at remote 0x7ffff3580ec0>, repo=<Repo(git=<Git at remote 0x7ffff3548cd8>, working_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive', _bare=False, git_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive/.git', odb=<GitCmdObjectDB(_ostream=None, _hexsha_to_file={}, _fd_open_flags=262144, _root_path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive/.git/objects', _git=<...>) at remote 0x7ffff3a8bfb0>, _working_tree_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff3a8bd10>, _cfg=None, path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff373c370>, _id=None, _path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff4b8e8b0>) at ../Objects/object.c:2262
#30 0x00007ffff4e323e4 in WraptFunctionWrapperBase_clear (self=0x7ffff3578ba0) at src/wrapt/_wrappers.c:1898
#31 0x00007ffff4e32657 in WraptFunctionWrapperBase_dealloc (self=0x7ffff3578ba0) at src/wrapt/_wrappers.c:1911
#32 0x000055555561f0bb in _Py_Dealloc (op=<BoundFunctionWrapper at remote 0x7ffff3578ba0>) at ../Objects/object.c:2262
#33 0x00005555556c52fd in call_function (pp_stack=0x7fffffff6030, oparg=1) at ../Python/ceval.c:4385
...

does anything ring any bell? if desired, I could provide a reproducible Dockefile to bring you to the same point. This was tested using current wrapt master as of 1.10.10 release

GrahamDumpleton commented 7 years ago

What are you using wrapt for? Just normal decorators?

yarikoptic commented 7 years ago

AFAIK -- yes. You can skim for wrapt usage in that PR: https://github.com/datalad/datalad/pull/1350/files

yarikoptic commented 7 years ago

any further information I could provide to help you to help us? ;-)

GrahamDumpleton commented 7 years ago

Not sure quite yet.

The issue is arising because garbage collection is kicking in while there is an in progress call to delete a wrapt object.

The related code in wrapt is:

static void WraptObjectProxy_dealloc(WraptObjectProxyObject *self)
{
    PyObject_GC_UnTrack(self);

    if (self->weakreflist != NULL)
        PyObject_ClearWeakRefs((PyObject *)self);

    WraptObjectProxy_clear(self);

    Py_TYPE(self)->tp_free(self);
}

Need to look at whether the order things are being down here is correct and that fields in the wrapper object are being set to NULL to avoid a dangling pointer which something is attempting to access as a result of garbage collection kicking in. Also need to see whether order fields are cleared could also be an issue.

Because though the first call is PyObject_GC_UnTrack() then the garbage collector shouldn't really be trying to access back to the wrapper object.

yarikoptic commented 7 years ago

Thank you @GrahamDumpleton for the analysis!

because garbage collection is kicking in while there is an in progress call to delete a wrapt object

interesting! could you think of any workaround we could do on our side at Python level to avoid it (besides not calling gc.collect directly)?

GrahamDumpleton commented 7 years ago

Do you have any idea what package this object is from:

<Dataset(_cfg=None, _repo=<GitRepo(realpath=<unknown at remote 0x7ffff354d9d0>, _GIT_COMMON_OPTIONS=['-c', 'receive.autogc=0', '-c', 'gc.auto=0'], cmd_call_wrapper=<unknown at remote 0x7ffff3580ec0>, repo=<Repo(git=<Git at remote 0x7ffff3548cd8>, working_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive', _bare=False, git_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive/.git', odb=<GitCmdObjectDB(_ostream=None, _hexsha_to_file={}, _fd_open_flags=262144, _root_path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive/.git/objects', _git=<...>) at remote 0x7ffff3a8bfb0>, _working_tree_dir='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff3a8bd10>, _cfg=None, path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff373c370>, _id=None, _path='/tmp/datalad_temp_test_install_crcnsrR0VqX/all-nonrecursive') at remote 0x7ffff4b8e8b0>

I am more inclined to think that that package may be the issue.

yarikoptic commented 7 years ago

Dataset is ours (datalad package, pure Python code, uses weakrefs for flywheel pattern) but has some attributes in it coming from git (GitPython) package

GrahamDumpleton commented 7 years ago

Current thinking is that running gc.collect() from within context of __del__(), which git package does via close() method, is just dodgy.

Would need to see the full gdb stack trace to determine if garbage collection was already running. It looks a bit like it may have been triggered when was already running. I can't seen any protections in Python GC code to guard against that.

yarikoptic commented 7 years ago

Hi @GrahamDumpleton ! Sorry for the super long delay -- we thought that we have managed to "work around" by just looking aside but not really -- issue came back and hits us hard. Since I was not diligent enough to capture enough information to reproduce that exactly environment, I had to chase it again and so far failed to make it reproducible using "python-dbg" build (so I could provide more info). But I do get it reproducibly using regular build of Python (so no handy info from python space). Here is the full traceback http://www.onerussian.com/tmp/wrapt-traceback.txt from running

gdb --args python -c 'import nose; nose.main()' -s -v datalad.distribution.tests.test_publish

on neurodebian stretch... I could try to reproduce it within docker and share an image if you think it would be of value

yarikoptic commented 7 years ago

It is hitting us more and more and unfortunately so far failed to reproduce under python-dbg. I wonder if you have any recommendations on how we could help to troubleshoot this issue

yarikoptic commented 7 years ago

whoohoo -- managed to reproduce using python3-dbg and current codebase (102d91de3c41500fc4f67bb19ac88978a77bdfad)

here is the top stack trace

#0  0x00007ffff6af5067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6af6448 in __GI_abort () at abort.c:89                                   
#2  0x00007ffff6aee266 in __assert_fail_base (fmt=0x7ffff6c27238 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x6fdb40 "((gc)->gc.gc_refs >> (1)) != 0",                
    file=file@entry=0x6fdaf0 "../Modules/gcmodule.c", line=line@entry=364,              
    function=function@entry=0x6fe23f <__PRETTY_FUNCTION__.10467> "update_refs") at assert.c:92
#3  0x00007ffff6aee312 in __GI___assert_fail (assertion=0x6fdb40 "((gc)->gc.gc_refs >> (1)) != 0",
    file=0x6fdaf0 "../Modules/gcmodule.c", line=364,                                    
    function=0x6fe23f <__PRETTY_FUNCTION__.10467> "update_refs") at assert.c:101        
#4  0x0000000000440063 in update_refs (containers=0x9b0240 <generations+64>) at ../Modules/gcmodule.c:364
#5  0x0000000000441336 in collect (generation=2, n_collected=0x7ffffffc8970, n_uncollectable=0x7ffffffc8978,
    nofail=0) at ../Modules/gcmodule.c:968                                              
#6  0x00000000004419c1 in collect_with_callback (generation=2) at ../Modules/gcmodule.c:1140
#7  0x0000000000441bd5 in gc_collect (self=0x7ffff41e7a58, args=0x7ffff7e02058, kws=0x0)
    at ../Modules/gcmodule.c:1234                                                       
#8  0x00000000004c9576 in PyCFunction_Call (func=0x7ffff419b598, arg=0x7ffff7e02058, kw=0x0)
    at ../Objects/methodobject.c:99                                                     
#9  0x00000000005bd5b8 in call_function (pp_stack=0x7ffffffc8ea0, oparg=0) at ../Python/ceval.c:4234
#10 0x00000000005b59c0 in PyEval_EvalFrameEx (f=0x7fffee9ee7b8, throwflag=0) at ../Python/ceval.c:2836
#11 0x00000000005bdbac in fast_function (func=0x7ffff30ee110, pp_stack=0x7ffffffca720, n=1, na=1, nk=0)
    at ../Python/ceval.c:4331                                                           
#12 0x00000000005bd7eb in call_function (pp_stack=0x7ffffffca720, oparg=0) at ../Python/ceval.c:4259
#13 0x00000000005b59c0 in PyEval_EvalFrameEx (f=0x1aaf288, throwflag=0) at ../Python/ceval.c:2836
#14 0x00000000005bac18 in PyEval_EvalCodeEx (_co=0x7ffff31924c0, globals=0x7ffff318f598, locals=0x0,
    args=0x7fffee9b9908, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
    at ../Python/ceval.c:3585                                                           
#15 0x00000000006f725e in function_call (func=0x7ffff30ee058, arg=0x7fffee9b98e0, kw=0x0)
    at ../Objects/funcobject.c:632                                                      
#16 0x00000000004678a8 in PyObject_Call (func=0x7ffff30ee058, arg=0x7fffee9b98e0, kw=0x0)
    at ../Objects/abstract.c:2067                                                       
#17 0x00000000004876b3 in method_call (func=0x7ffff30ee058, arg=0x7fffee9b98e0, kw=0x0) 
    at ../Objects/classobject.c:347                                                     
#18 0x00000000004678a8 in PyObject_Call (func=0x7fffef67ea68, arg=0x7ffff7e02058, kw=0x0)
    at ../Objects/abstract.c:2067                                                       
#19 0x00000000005bcc4a in PyEval_CallObjectWithKeywords (func=0x7fffef67ea68, arg=0x7ffff7e02058, kw=0x0)
    at ../Python/ceval.c:4111                                                           
#20 0x00000000004fb19b in slot_tp_finalize (self=0x7fffee9fcae8) at ../Objects/typeobject.c:6079
#21 0x00000000004cd668 in PyObject_CallFinalizer (self=0x7fffee9fcae8) at ../Objects/object.c:278
#22 0x00000000004cd6d6 in PyObject_CallFinalizerFromDealloc (self=0x7fffee9fcae8) at ../Objects/object.c:295
#23 0x00000000004e8322 in subtype_dealloc (self=0x7fffee9fcae8) at ../Objects/typeobject.c:1110
#24 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffee9fcae8) at ../Objects/object.c:1749   
#25 0x00000000004bb621 in dict_dealloc (mp=0x7fffed7dede8) at ../Objects/dictobject.c:1383
#26 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffed7dede8) at ../Objects/object.c:1749   
#27 0x00000000004e85b9 in subtype_dealloc (self=0x7fffeea072c8) at ../Objects/typeobject.c:1157
#28 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffeea072c8) at ../Objects/object.c:1749   
#29 0x00000000004bb621 in dict_dealloc (mp=0x7fffee9207c8) at ../Objects/dictobject.c:1383
#30 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffee9207c8) at ../Objects/object.c:1749   
#31 0x00000000004e85b9 in subtype_dealloc (self=0x7fffee9b9608) at ../Objects/typeobject.c:1157
#32 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffee9b9608) at ../Objects/object.c:1749   
#33 0x00007ffff3efc2d0 in WraptFunctionWrapperBase_clear (self=0x7fffed7ee480) at src/wrapt/_wrappers.c:1898
#34 0x00007ffff3efc54b in WraptFunctionWrapperBase_dealloc (self=0x7fffed7ee480) at src/wrapt/_wrappers.c:1911
#35 0x00000000004d0eb6 in _Py_Dealloc (op=0x7fffed7ee480) at ../Objects/object.c:1749
#36 0x00000000005bd994 in call_function (pp_stack=0x7ffffffcc800, oparg=257) at ../Python/ceval.c:4274
#37 0x00000000005b59c0 in PyEval_EvalFrameEx (f=0x1d02588, throwflag=0) at ../Python/ceval.c:2836
#38 0x00000000005bac18 in PyEval_EvalCodeEx (_co=0x7fffefa41940, globals=0x7fffefa70758, locals=0x0,
    args=0x7fffed7fef60, argcount=5, kws=0x7ffff7e02080, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0) at ../Python/ceval.c:3585
#39 0x00000000006f725e in function_call (func=0x7fffefa78110, arg=0x7fffed7fef38, kw=0x7fffef6366e8)
    at ../Objects/funcobject.c:632
...

any particular information I could/should look up -- I am keeping that gdb available

GrahamDumpleton commented 7 years ago

I have been travelling for the last few weeks so only just catching up on various emails.

The issue still shows as being related to triggering of garbage collection, so nothing changed there.

I will have to refresh my memory again of wrapt code and what that third party package is doing to try understand the implications of that forced garbage collection.

yarikoptic commented 7 years ago

Thanks in advance @GrahamDumpleton ... btw -- tried some older wrapt version (1.5.0) and it gave me a different death sign:

[Level 5] Removing temp file: /tmp/datalad_temp_test_publish_recursivetL37vR 
Fatal Python error: deletion of interned string failed
Aborted (core dumped)

so decided to share in case it might give a clue

yarikoptic commented 7 years ago

Here is a repo with a Dockerfile https://github.com/yarikoptic/segfault-reproduce/blob/master/Dockerfile which leads to that segfault, happen you need a reproducible scenario. Let me know if you would like a full image to not wait for it to get generated. ATM I am trying to reproduce it while using python-dbg so you could have more information available

yarikoptic commented 7 years ago

FWIW:

$> git diff
diff --git a/datalad/__init__.py b/datalad/__init__.py
index c10b5a80..ed490479 100644
--- a/datalad/__init__.py
+++ b/datalad/__init__.py
@@ -40,6 +40,18 @@ atexit.register(lgr.log, 5, "Exiting")
 from .version import __version__

+from functools import wraps
+import gc
+_gc_collect = gc.collect
+
+@wraps(_gc_collect)
+def our_collect(*args, **kwargs):
+    lgr.warn("Called with {}, {}".format(args, kwargs))
+    out = _gc_collect(*args, **kwargs)
+    lgr.warn("Ready to return with {}".format(out))
+    return out
+gc.collect = our_collect

so it should overload I think any subsequent explicit call to gc.collect and segfault happens after "Ready to return",

[WARNING] ...>repo.base:189,196>datalad.__init__:49  Called with (), {} 
[WARNING] ...>datalad.__init__:51  Ready to return with 0 

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff4a564b5 in WraptFunctionWrapperBase_clear (self=self@entry=0x7fffec115180) at src/wrapt/_wrappers.c:1899
1899        Py_CLEAR(self->wrapper);

suggestive that it is not within explicit gc.collect call within __del__ (those with repo.base:189,186 traceback)...

unfortunately the segfault kept escaping my gdb if using "proper" python-dbg build so I could see more of the python traceback

yarikoptic commented 7 years ago

Dear @GrahamDumpleton , please advise on how we should proceed about this issue. It would be sad, but understood, if you have no time to work on this issue ATM. We just need to know to either

GrahamDumpleton commented 7 years ago

Can you confirm that this is the minimal test case by which it can be replicated.

@with_testrepos('submodule_annex', flavors=['local'])
@with_tempfile(mkdir=True)
@with_tempfile(mkdir=True)
@with_tempfile(mkdir=True)
@with_tempfile(mkdir=True)
def test_publish_segfault(origin, src_path, dst_path, sub1_pub, sub2_pub):
    source = install(src_path, source=origin, recursive=True)
    sub2_target = AnnexRepo(sub2_pub, create=True)
    sub2_target.config.set('receive.denyCurrentBranch', 'updateInstead', where='local')
    sub2 = GitRepo(opj(src_path, 'subm 2'), create=False)
    sub2.add_remote("target", sub2_pub)
    create_tree(sub2.path, {'file.dat': 'content'})
    Dataset(sub2.path).add('file.dat')

    sys.stderr.write('XXXXXXXXXXXXXXX enter\n')
    res_ = publish(dataset=source, recursive=True, on_failure='ignore')
    sys.stderr.write('XXXXXXXXXXXXXXX exit\n')

It perhaps changes the way GC behaves, but commenting out the add_remote() results in it then not occurring.

GrahamDumpleton commented 7 years ago

And can you explain where publish in:

from datalad.api import publish

actually comes from, where source code is, and how you are wrapping it.

yarikoptic commented 7 years ago

Thank you @GrahamDumpleton !!!

  1. confirming! (running 2nd one with WRAPT_DISABLE_EXTENSIONS=1 to make sure it is the same beast)
    
    (venv-regular) root@ae4a1f1452ef:/tmp/datalad# `which python` -m nose -s -v  datalad.distribution.tests.test_publish:test_publish_segfault
    datalad.distribution.tests.test_publish.test_publish_segfault ... XXXXXXXXXXXXXXX enter
    Segmentation fault (core dumped)

(venv-regular) root@ae4a1f1452ef:/tmp/datalad# WRAPT_DISABLE_EXTENSIONS=1 which python -m nose -s -v datalad.distribution.tests.test_publish:test_publish_segfault datalad.distribution.tests.test_publish.test_publish_segfault ... XXXXXXXXXXXXXXX enter XXXXXXXXXXXXXXX exit ok Versions: cmd:annex=6.20170525+gitge1cf095ae-1~ndall+1 cmd:git=2.11.0 cmd:system-git=2.11.0 cmd:system-ssh=7.4p1 git=2.1.3 tqdm=4.11.2


Ran 1 test in 30.255s

OK


2. publish source code is pretty much this `__call__`: https://github.com/datalad/datalad/blob/master/datalad/distribution/publish.py#L379 which later bound into a function at the module level within https://github.com/datalad/datalad/blob/master/datalad/api.py#L66
GrahamDumpleton commented 7 years ago

If you had ever had logging enabled, did it always suggest it crashed when calling:

ds.repo.push(remote=remote, refspec=things2push)
yarikoptic commented 7 years ago

I don't think we have log record to that amount of detail to answer this question :-/ FWIW


since it would have logged "Attempt to push".

And (probably unrelated) -- if I add also `DATALAD_LOG_TRACEBACK=collide` to include traceback -- I manage to not trigger this bug at all on this test
GrahamDumpleton commented 7 years ago

Can you provide output from running pldd on one of your processes before it crashes. I want to see a record of what Python modules with C extensions are being used, to see if there are any uncommon third party ones in use.

yarikoptic commented 7 years ago

Will do, thanks But didn't you use that dockerfile yet to reproduce the issue on your box?

yarikoptic commented 7 years ago

re pldd -- the issue I have observed (e.g. pldd $$ in the shell just shows the shell and then hangs although it is not statically linked and more stuff is present) is described in its manpage:

BUGS
       Since glibc 2.19, pldd is broken: it just hangs when executed.  It is unclear  if  it  will
       ever be fixed.

but here is output of /proc/PID/maps which should have all the linked goodness as well, so adjusted that test code to be

    sys.stderr.write('XXXXXXXXXXXXXXX enter\n')
    print(open("/proc/%d/maps" % os.getpid()).read())
    res_ = publish(dataset=source, recursive=True, on_failure='ignore')
    sys.stderr.write('XXXXXXXXXXXXXXX exit\n')                            
(venv-regular) root@ae4a1f1452ef:/tmp/datalad# `which python` -m nose -s -v  datalad.distribution.tests.test_publish:test_publish_segfault
datalad.distribution.tests.test_publish.test_publish_segfault ... XXXXXXXXXXXXXXX enter
559b57f43000-559b58266000 r-xp 00000000 00:26 626291                     /tmp/datalad/venv-regular/bin/python2
559b58466000-559b58468000 r--p 00323000 00:26 626291                     /tmp/datalad/venv-regular/bin/python2
559b58468000-559b584df000 rw-p 00325000 00:26 626291                     /tmp/datalad/venv-regular/bin/python2
559b584df000-559b58502000 rw-p 00000000 00:00 0 
559b5a354000-559b5cf37000 rw-p 00000000 00:00 0                          [heap]
7fcc6b8cc000-7fcc6b94c000 rw-p 00000000 00:00 0 
7fcc6b94c000-7fcc6b969000 r-xp 00000000 00:26 29909                      /usr/lib/x86_64-linux-gnu/libyaml-0.so.2.0.5
7fcc6b969000-7fcc6bb69000 ---p 0001d000 00:26 29909                      /usr/lib/x86_64-linux-gnu/libyaml-0.so.2.0.5
7fcc6bb69000-7fcc6bb6a000 r--p 0001d000 00:26 29909                      /usr/lib/x86_64-linux-gnu/libyaml-0.so.2.0.5
7fcc6bb6a000-7fcc6bb6b000 rw-p 0001e000 00:26 29909                      /usr/lib/x86_64-linux-gnu/libyaml-0.so.2.0.5
7fcc6bb6b000-7fcc6bb9b000 r-xp 00000000 00:26 12037                      /usr/lib/python2.7/dist-packages/_yaml.x86_64-linux-gnu.so
7fcc6bb9b000-7fcc6bd9a000 ---p 00030000 00:26 12037                      /usr/lib/python2.7/dist-packages/_yaml.x86_64-linux-gnu.so
7fcc6bd9a000-7fcc6bd9b000 r--p 0002f000 00:26 12037                      /usr/lib/python2.7/dist-packages/_yaml.x86_64-linux-gnu.so
7fcc6bd9b000-7fcc6bd9e000 rw-p 00030000 00:26 12037                      /usr/lib/python2.7/dist-packages/_yaml.x86_64-linux-gnu.so
7fcc6bd9e000-7fcc6bfdf000 rw-p 00000000 00:00 0 
7fcc6bfdf000-7fcc6bfe0000 r-xp 00000000 00:26 13180                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_constant_time.x86_64-linux-gnu.so
7fcc6bfe0000-7fcc6c1e0000 ---p 00001000 00:26 13180                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_constant_time.x86_64-linux-gnu.so
7fcc6c1e0000-7fcc6c1e1000 r--p 00001000 00:26 13180                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_constant_time.x86_64-linux-gnu.so
7fcc6c1e1000-7fcc6c1e2000 rw-p 00002000 00:26 13180                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_constant_time.x86_64-linux-gnu.so
7fcc6c1e2000-7fcc6c5e3000 rw-p 00000000 00:00 0 
7fcc6c5e3000-7fcc6c604000 r-xp 00000000 00:26 12029                      /usr/lib/python2.7/dist-packages/_cffi_backend.x86_64-linux-gnu.so
7fcc6c604000-7fcc6c804000 ---p 00021000 00:26 12029                      /usr/lib/python2.7/dist-packages/_cffi_backend.x86_64-linux-gnu.so
7fcc6c804000-7fcc6c805000 r--p 00021000 00:26 12029                      /usr/lib/python2.7/dist-packages/_cffi_backend.x86_64-linux-gnu.so
7fcc6c805000-7fcc6c80a000 rw-p 00022000 00:26 12029                      /usr/lib/python2.7/dist-packages/_cffi_backend.x86_64-linux-gnu.so
7fcc6c80a000-7fcc6c80d000 rw-p 00000000 00:00 0 
7fcc6c80d000-7fcc6c8ce000 r-xp 00000000 00:26 13181                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_openssl.x86_64-linux-gnu.so
7fcc6c8ce000-7fcc6cacd000 ---p 000c1000 00:26 13181                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_openssl.x86_64-linux-gnu.so
7fcc6cacd000-7fcc6cae2000 r--p 000c0000 00:26 13181                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_openssl.x86_64-linux-gnu.so
7fcc6cae2000-7fcc6caea000 rw-p 000d5000 00:26 13181                      /usr/lib/python2.7/dist-packages/cryptography/hazmat/bindings/_openssl.x86_64-linux-gnu.so
7fcc6caea000-7fcc6cc2a000 rw-p 00000000 00:00 0 
7fcc6cc2a000-7fcc6cc51000 r-xp 00000000 00:26 9382                       /lib/x86_64-linux-gnu/libexpat.so.1.6.2
7fcc6cc51000-7fcc6ce51000 ---p 00027000 00:26 9382                       /lib/x86_64-linux-gnu/libexpat.so.1.6.2
7fcc6ce51000-7fcc6ce53000 r--p 00027000 00:26 9382                       /lib/x86_64-linux-gnu/libexpat.so.1.6.2
7fcc6ce53000-7fcc6ce54000 rw-p 00029000 00:26 9382                       /lib/x86_64-linux-gnu/libexpat.so.1.6.2
7fcc6ce54000-7fcc6ce64000 r-xp 00000000 00:26 26714                      /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so
7fcc6ce64000-7fcc6d064000 ---p 00010000 00:26 26714                      /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so
7fcc6d064000-7fcc6d065000 r--p 00010000 00:26 26714                      /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so
7fcc6d065000-7fcc6d067000 rw-p 00011000 00:26 26714                      /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so
7fcc6d067000-7fcc6d227000 rw-p 00000000 00:00 0 
7fcc6d227000-7fcc6d231000 r-xp 00000000 00:26 23770                      /usr/lib/python2.7/dist-packages/simplejson/_speedups.x86_64-linux-gnu.so
7fcc6d231000-7fcc6d430000 ---p 0000a000 00:26 23770                      /usr/lib/python2.7/dist-packages/simplejson/_speedups.x86_64-linux-gnu.so
7fcc6d430000-7fcc6d431000 r--p 00009000 00:26 23770                      /usr/lib/python2.7/dist-packages/simplejson/_speedups.x86_64-linux-gnu.so
7fcc6d431000-7fcc6d432000 rw-p 0000a000 00:26 23770                      /usr/lib/python2.7/dist-packages/simplejson/_speedups.x86_64-linux-gnu.so
7fcc6d432000-7fcc6d5f2000 rw-p 00000000 00:00 0 
7fcc6d5f2000-7fcc6d5f6000 r-xp 00000000 00:26 1258                       /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7fcc6d5f6000-7fcc6d7f5000 ---p 00004000 00:26 1258                       /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7fcc6d7f5000-7fcc6d7f6000 r--p 00003000 00:26 1258                       /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7fcc6d7f6000-7fcc6d7f7000 rw-p 00004000 00:26 1258                       /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7fcc6d7f7000-7fcc6d7fe000 r-xp 00000000 00:26 29675                      /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fcc6d7fe000-7fcc6d9fe000 ---p 00007000 00:26 29675                      /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fcc6d9fe000-7fcc6d9ff000 r--p 00007000 00:26 29675                      /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fcc6d9ff000-7fcc6da00000 rw-p 00008000 00:26 29675                      /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fcc6da00000-7fcc6da20000 r-xp 00000000 00:26 26664                      /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
7fcc6da20000-7fcc6dc1f000 ---p 00020000 00:26 26664                      /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
7fcc6dc1f000-7fcc6dc20000 r--p 0001f000 00:26 26664                      /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
7fcc6dc20000-7fcc6dc24000 rw-p 00020000 00:26 26664                      /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
7fcc6dc24000-7fcc6dca4000 rw-p 00000000 00:00 0 
7fcc6dca4000-7fcc6dca9000 r-xp 00000000 00:26 26706                      /usr/lib/python2.7/lib-dynload/mmap.x86_64-linux-gnu.so
7fcc6dca9000-7fcc6dea8000 ---p 00005000 00:26 26706                      /usr/lib/python2.7/lib-dynload/mmap.x86_64-linux-gnu.so
7fcc6dea8000-7fcc6dea9000 r--p 00004000 00:26 26706                      /usr/lib/python2.7/lib-dynload/mmap.x86_64-linux-gnu.so
7fcc6dea9000-7fcc6deaa000 rw-p 00005000 00:26 26706                      /usr/lib/python2.7/lib-dynload/mmap.x86_64-linux-gnu.so
7fcc6deaa000-7fcc6deba000 r-xp 00000000 00:26 26678                      /usr/lib/python2.7/lib-dynload/_json.x86_64-linux-gnu.so
7fcc6deba000-7fcc6e0b9000 ---p 00010000 00:26 26678                      /usr/lib/python2.7/lib-dynload/_json.x86_64-linux-gnu.so
7fcc6e0b9000-7fcc6e0ba000 r--p 0000f000 00:26 26678                      /usr/lib/python2.7/lib-dynload/_json.x86_64-linux-gnu.so
7fcc6e0ba000-7fcc6e0bb000 rw-p 00010000 00:26 26678                      /usr/lib/python2.7/lib-dynload/_json.x86_64-linux-gnu.so
7fcc6e0bb000-7fcc6e1bb000 rw-p 00000000 00:00 0 
7fcc6e1bb000-7fcc6e1bf000 r-xp 00000000 00:26 26720                      /usr/lib/python2.7/lib-dynload/termios.x86_64-linux-gnu.so
7fcc6e1bf000-7fcc6e3be000 ---p 00004000 00:26 26720                      /usr/lib/python2.7/lib-dynload/termios.x86_64-linux-gnu.so
7fcc6e3be000-7fcc6e3bf000 r--p 00003000 00:26 26720                      /usr/lib/python2.7/lib-dynload/termios.x86_64-linux-gnu.so
7fcc6e3bf000-7fcc6e3c1000 rw-p 00004000 00:26 26720                      /usr/lib/python2.7/lib-dynload/termios.x86_64-linux-gnu.so
7fcc6e3c1000-7fcc6e441000 rw-p 00000000 00:00 0 
7fcc6e441000-7fcc6e449000 r-xp 00000000 00:26 793159                     /tmp/datalad/venv-regular/lib/python2.7/site-packages/wrapt/_wrappers.so
7fcc6e449000-7fcc6e648000 ---p 00008000 00:26 793159                     /tmp/datalad/venv-regular/lib/python2.7/site-packages/wrapt/_wrappers.so
7fcc6e648000-7fcc6e649000 r--p 00007000 00:26 793159                     /tmp/datalad/venv-regular/lib/python2.7/site-packages/wrapt/_wrappers.so
7fcc6e649000-7fcc6e64b000 rw-p 00008000 00:26 793159                     /tmp/datalad/venv-regular/lib/python2.7/site-packages/wrapt/_wrappers.so
7fcc6e64b000-7fcc6e70b000 rw-p 00000000 00:00 0 
7fcc6e70b000-7fcc6e71e000 r-xp 00000000 00:26 26712                      /usr/lib/python2.7/lib-dynload/parser.x86_64-linux-gnu.so
7fcc6e71e000-7fcc6e91e000 ---p 00013000 00:26 26712                      /usr/lib/python2.7/lib-dynload/parser.x86_64-linux-gnu.so
7fcc6e91e000-7fcc6e91f000 r--p 00013000 00:26 26712                      /usr/lib/python2.7/lib-dynload/parser.x86_64-linux-gnu.so
7fcc6e91f000-7fcc6e920000 rw-p 00014000 00:26 26712                      /usr/lib/python2.7/lib-dynload/parser.x86_64-linux-gnu.so
7fcc6e920000-7fcc6e922000 r-xp 00000000 00:26 26718                      /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so
7fcc6e922000-7fcc6eb22000 ---p 00002000 00:26 26718                      /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so
7fcc6eb22000-7fcc6eb23000 r--p 00002000 00:26 26718                      /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so
7fcc6eb23000-7fcc6eb24000 rw-p 00003000 00:26 26718                      /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so
7fcc6eb24000-7fcc6eb2d000 r-xp 00000000 00:26 26676                      /usr/lib/python2.7/lib-dynload/_hotshot.x86_64-linux-gnu.so
7fcc6eb2d000-7fcc6ed2c000 ---p 00009000 00:26 26676                      /usr/lib/python2.7/lib-dynload/_hotshot.x86_64-linux-gnu.so
7fcc6ed2c000-7fcc6ed2d000 r--p 00008000 00:26 26676                      /usr/lib/python2.7/lib-dynload/_hotshot.x86_64-linux-gnu.so
7fcc6ed2d000-7fcc6ed2f000 rw-p 00009000 00:26 26676                      /usr/lib/python2.7/lib-dynload/_hotshot.x86_64-linux-gnu.so
7fcc6ed2f000-7fcc6edef000 rw-p 00000000 00:00 0 
7fcc6edef000-7fcc6edf5000 r-xp 00000000 00:26 26684                      /usr/lib/python2.7/lib-dynload/_multiprocessing.x86_64-linux-gnu.so
7fcc6edf5000-7fcc6eff4000 ---p 00006000 00:26 26684                      /usr/lib/python2.7/lib-dynload/_multiprocessing.x86_64-linux-gnu.so
7fcc6eff4000-7fcc6eff5000 r--p 00005000 00:26 26684                      /usr/lib/python2.7/lib-dynload/_multiprocessing.x86_64-linux-gnu.so
7fcc6eff5000-7fcc6eff6000 rw-p 00006000 00:26 26684                      /usr/lib/python2.7/lib-dynload/_multiprocessing.x86_64-linux-gnu.so
7fcc6eff6000-7fcc6f1b6000 rw-p 00000000 00:00 0 
7fcc6f1f0000-7fcc6f2f0000 rw-p 00000000 00:00 0 
7fcc6f2f0000-7fcc6f306000 r-xp 00000000 00:26 26688                      /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so
7fcc6f306000-7fcc6f505000 ---p 00016000 00:26 26688                      /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so
7fcc6f505000-7fcc6f506000 r--p 00015000 00:26 26688                      /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so
7fcc6f506000-7fcc6f50a000 rw-p 00016000 00:26 26688                      /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so
7fcc6f50a000-7fcc6f76f000 r-xp 00000000 00:26 29651                      /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
7fcc6f76f000-7fcc6f96e000 ---p 00265000 00:26 29651                      /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
7fcc6f96e000-7fcc6f98c000 r--p 00264000 00:26 29651                      /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
7fcc6f98c000-7fcc6f99a000 rw-p 00282000 00:26 29651                      /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
7fcc6f99a000-7fcc6f99d000 rw-p 00000000 00:00 0 
7fcc6f99d000-7fcc6f9ff000 r-xp 00000000 00:26 29889                      /usr/lib/x86_64-linux-gnu/libssl.so.1.1
7fcc6f9ff000-7fcc6fbff000 ---p 00062000 00:26 29889                      /usr/lib/x86_64-linux-gnu/libssl.so.1.1
7fcc6fbff000-7fcc6fc03000 r--p 00062000 00:26 29889                      /usr/lib/x86_64-linux-gnu/libssl.so.1.1
7fcc6fc03000-7fcc6fc09000 rw-p 00066000 00:26 29889                      /usr/lib/x86_64-linux-gnu/libssl.so.1.1
7fcc6fc09000-7fcc6fc0e000 r-xp 00000000 00:26 26674                      /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so
7fcc6fc0e000-7fcc6fe0e000 ---p 00005000 00:26 26674                      /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so
7fcc6fe0e000-7fcc6fe0f000 r--p 00005000 00:26 26674                      /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so
7fcc6fe0f000-7fcc6fe10000 rw-p 00006000 00:26 26674                      /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so
7fcc6fe10000-7fcc6ff50000 rw-p 00000000 00:00 0 
7fcc6ff51000-7fcc700d1000 rw-p 00000000 00:00 0 
7fcc700d1000-7fcc70266000 r-xp 00000000 00:26 9369                       /lib/x86_64-linux-gnu/libc-2.24.so
7fcc70266000-7fcc70466000 ---p 00195000 00:26 9369                       /lib/x86_64-linux-gnu/libc-2.24.so
7fcc70466000-7fcc7046a000 r--p 00195000 00:26 9369                       /lib/x86_64-linux-gnu/libc-2.24.so
7fcc7046a000-7fcc7046c000 rw-p 00199000 00:26 9369                       /lib/x86_64-linux-gnu/libc-2.24.so
7fcc7046c000-7fcc70470000 rw-p 00000000 00:00 0 
7fcc70470000-7fcc70573000 r-xp 00000000 00:26 9399                       /lib/x86_64-linux-gnu/libm-2.24.so
7fcc70573000-7fcc70772000 ---p 00103000 00:26 9399                       /lib/x86_64-linux-gnu/libm-2.24.so
7fcc70772000-7fcc70773000 r--p 00102000 00:26 9399                       /lib/x86_64-linux-gnu/libm-2.24.so
7fcc70773000-7fcc70774000 rw-p 00103000 00:26 9399                       /lib/x86_64-linux-gnu/libm-2.24.so
7fcc70774000-7fcc7078d000 r-xp 00000000 00:26 277047                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fcc7078d000-7fcc7098c000 ---p 00019000 00:26 277047                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fcc7098c000-7fcc7098d000 r--p 00018000 00:26 277047                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fcc7098d000-7fcc7098e000 rw-p 00019000 00:26 277047                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fcc7098e000-7fcc70990000 r-xp 00000000 00:26 9434                       /lib/x86_64-linux-gnu/libutil-2.24.so
7fcc70990000-7fcc70b8f000 ---p 00002000 00:26 9434                       /lib/x86_64-linux-gnu/libutil-2.24.so
7fcc70b8f000-7fcc70b90000 r--p 00001000 00:26 9434                       /lib/x86_64-linux-gnu/libutil-2.24.so
7fcc70b90000-7fcc70b91000 rw-p 00002000 00:26 9434                       /lib/x86_64-linux-gnu/libutil-2.24.so
7fcc70b91000-7fcc70b94000 r-xp 00000000 00:26 9379                       /lib/x86_64-linux-gnu/libdl-2.24.so
7fcc70b94000-7fcc70d93000 ---p 00003000 00:26 9379                       /lib/x86_64-linux-gnu/libdl-2.24.so
7fcc70d93000-7fcc70d94000 r--p 00002000 00:26 9379                       /lib/x86_64-linux-gnu/libdl-2.24.so
7fcc70d94000-7fcc70d95000 rw-p 00003000 00:26 9379                       /lib/x86_64-linux-gnu/libdl-2.24.so
7fcc70d95000-7fcc70dad000 r-xp 00000000 00:26 9421                       /lib/x86_64-linux-gnu/libpthread-2.24.so
7fcc70dad000-7fcc70fac000 ---p 00018000 00:26 9421                       /lib/x86_64-linux-gnu/libpthread-2.24.so
7fcc70fac000-7fcc70fad000 r--p 00017000 00:26 9421                       /lib/x86_64-linux-gnu/libpthread-2.24.so
7fcc70fad000-7fcc70fae000 rw-p 00018000 00:26 9421                       /lib/x86_64-linux-gnu/libpthread-2.24.so
7fcc70fae000-7fcc70fb2000 rw-p 00000000 00:00 0 
7fcc70fb2000-7fcc70fd5000 r-xp 00000000 00:26 9357                       /lib/x86_64-linux-gnu/ld-2.24.so
7fcc70fd5000-7fcc710d5000 rw-p 00000000 00:00 0 
7fcc71106000-7fcc711ca000 rw-p 00000000 00:00 0 
7fcc711d1000-7fcc711d2000 rwxp 00000000 00:00 0 
7fcc711d2000-7fcc711d5000 rw-p 00000000 00:00 0 
7fcc711d5000-7fcc711d6000 r--p 00023000 00:26 9357                       /lib/x86_64-linux-gnu/ld-2.24.so
7fcc711d6000-7fcc711d7000 rw-p 00024000 00:26 9357                       /lib/x86_64-linux-gnu/ld-2.24.so
7fcc711d7000-7fcc711d8000 rw-p 00000000 00:00 0 
7fff72b28000-7fff72b49000 rw-p 00000000 00:00 0                          [stack]
7fff72b64000-7fff72b66000 r--p 00000000 00:00 0                          [vvar]
7fff72b66000-7fff72b68000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Segmentation fault (core dumped)
GrahamDumpleton commented 7 years ago

FWIW. Disabling GC doesn't make any difference.

import gc
gc.set_threshold(0, 0, 0)

This puts it down to being a pure reference counting issue.

Given that wrapt is used in a significant number of production systems as part of a performance monitoring system, if there was an issue, would expect it to be affecting a lot of people.

yarikoptic commented 7 years ago

Could it be some interaction with weakref we are using as well (and many others don't)?

GrahamDumpleton commented 7 years ago

Am still digging into it, but can you try the reference-count-issues branch of this repo to see if that helps with the issue.

GrahamDumpleton commented 7 years ago

The incorrect decref in WraptObjectProxy_getattro, although wrong, doesn't break anything as it is a PyCFunction in tp_methods as adjusting refcount on those is a noop. Now to work out impact of others.

GrahamDumpleton commented 7 years ago

Crap. Been wasting my time. Docs weren't saying whether function returned new or borrowed reference and because function next to it in docs said it returned new reference, brain switched off and assumed that function I was using must not then since it doesn't say so.

yarikoptic commented 7 years ago

truly sorry. does it mean that I shouldn't bother testing that branch?

GrahamDumpleton commented 7 years ago

Yeah, don't bother. I will delete it soon. I only went down that path as when I went to merge environment variable stuff, I noticed that I had deleted one of the DECREFs months ago at PyCon but not committed it, so was trying to work out why.

yarikoptic commented 7 years ago

I wonder if there anything else I could do to make it possible/easier to pin point the issue?

GrahamDumpleton commented 7 years ago

Can you trying running with a modified version of wrapt which uses:

/* ------------------------------------------------------------------------- */

static int WraptObjectProxy_traverse(WraptObjectProxyObject *self,
        visitproc visit, void *arg)
{
    Py_VISIT(self->dict);
#if 0
    Py_VISIT(self->wrapped);
#endif

    return 0;
}

/* ------------------------------------------------------------------------- */

static int WraptObjectProxy_clear(WraptObjectProxyObject *self)
{
    Py_CLEAR(self->dict);
#if 0
    Py_CLEAR(self->wrapped);
#endif

    return 0;
}

I want to see if that just moves the problem, or totally eliminates it.

yarikoptic commented 7 years ago

According to all green of https://travis-ci.org/datalad/datalad/builds/270137906 SEGFAULT IS GONE . We do not do any memory tracking or anything to suggest if it introduced any memleak etc

GrahamDumpleton commented 7 years ago

It will result in a resource leak due to how the clear function was also being used when the object was being destroyed. Still trying to understand what the requirements are for adding GC support to C objects. The documentation is not very good on it.

yarikoptic commented 7 years ago

Thanks again for looking into it. If you let me know which branch to try I will adjust my travis script so we could just keep rerunning that build while installing wrapt from that branch

GrahamDumpleton commented 7 years ago

Problem I have at the moment is that when I reverted back to the original code, I can't replicate it any more so not sure what is going on.

yarikoptic commented 7 years ago

FWIW wild guess -- if I got it right -- may be you have "double" fixed the leak some time after you introduced this initial (which we reverted to avoid segfault) change? so this way leak is gone

yarikoptic commented 7 years ago

Hi @GrahamDumpleton , anyhow I could be of help to move it forward?

yarikoptic commented 7 years ago

FWIW, checked again with the current state of the develop and got the beastie segfaulting like a champ (FWIW: https://travis-ci.org/datalad/datalad/builds/287579064?utm_source=github_status&utm_medium=notification)

yarikoptic commented 7 years ago

oh how wonderful life will be when we get to the bottom of this issue -- ppl will stop asking about segfaulting tests, I will not need to type long environment variable names ;)

joshuahaertel commented 5 years ago

I'm getting this about every 20 seconds when I run my server under load. After some investigating, I'm pretty sure the problem is for objects that implement their own dealloc function, we're running PY_CLEAR before we run PyObject_GC_UnTrack (https://docs.python.org/3/c-api/gcsupport.html#c.PyObject_GC_UnTrack).

Perhaps what we can do is make a base dealloc function that looks just like the original WraptObjectProxy_dealloc, but instead of calling WraptObjectProxy_clear, it takes in a clear function pointer as an argument. Then all the different deallocator functions can call that with the appropriate clear function and take care of that after untracking occurs (I'm not sure how nicely the types will play with each other though). I've forked the repo and am trying to work on a fix in my spare time -- but I don't have that much free time right now, so if someone is able to beat me to it, it should be a quick fix :slightly_smiling_face:

GrahamDumpleton commented 5 years ago

@joshuahaertel Can you provide a code example of when you think the problem arises? Not sure what object you might be referring to when you talk about a custom dealloc function.

joshuahaertel commented 5 years ago

Yeah, so my stack trace specifically goes from WraptFunctionWrapperBase_dealloc to WraptFunctionWrapperBase_clear. Clear takes it back to python land, and, in my case, I'm actually switching threads right then. The new thread then starts garbage collection and the error occurs.

The code for dealloc, with an added comment:

static void WraptFunctionWrapperBase_dealloc(WraptFunctionWrapperObject *self)
{
    WraptFunctionWrapperBase_clear(self);

    // We start untracking here, even though we've already cleared our attributes
    WraptObjectProxy_dealloc((WraptObjectProxyObject *)self);
}

WraptPartialCallableObjectProxy_dealloc does the same and should hit the error as well.

joshuahaertel commented 5 years ago

Sorry I don't have a specific python code example, this is actually embedded somewhere in a library I'm using and I haven't looked into that code at all. I think my theory is reasonably sound though. If you need a code example for a unit test, I can dive deeper into that. I haven't looked at the previous unit test yarkioptic made, I assumed it would probably be good enough.

joshuahaertel commented 5 years ago

Here is some python code to reproduce the segfault. I'm running Python 3.6.3 on Lubuntu 17.10. I'm just trying to get my C environment setup to refactor the dealloc calls.

import threading

import gc

from src.wrapt import decorator

gc_event = threading.Event()

@decorator
def wrapper(*_, **__):
    pass

class Foo:
    @wrapper
    def bar(self):
        pass

    def __del__(self):
        # The CPython interpreter can do this automatically (at least during
        # a thread switch) without the user code explicitly calling collect
        gc.collect(1)
        gc_event.set()

def main():
    bar = Foo().bar
    del bar
    gc_event.wait()

if __name__ == '__main__':
    main()
GrahamDumpleton commented 5 years ago

Thanks for that. I tried the gc.collect() in a __del__ method when this originally came up but couldn’t replicate on my environment. So something perhaps a bit different with way you are doing it or it is my environment. Anyway, will try and look at it next week. I am on a trip this week for conferences so very little time and don’t have my normal development machine with me.

joshuahaertel commented 5 years ago

Interestingly enough, removing the event lets the script exit cleanly, so there's definitely something up with that extra bit of code.

I totally understand the struggle of developing while being out of office -- Lubuntu 17.10 is not my usual dev environment :laughing: Feel free to pursue this when you have time! Meanwhile, I made a PR for you to consider when implementing a fix: https://github.com/GrahamDumpleton/wrapt/pull/133

GrahamDumpleton commented 5 years ago

Yeah, I figured it must have been something to do with the event which is why I couldn’t replicate it, as I didn’t have that. Good to get confirmation on that.

GrahamDumpleton commented 5 years ago

Even newer test script to try and trigger this, doesn't replicate it on macOS with Python 3.6. :-(