CozySynthesizer / cozy

The collection synthesizer
https://cozy.uwplse.org
Apache License 2.0
209 stars 18 forks source link

Segfault in Z3 API for some inputs #120

Closed Calvin-L closed 4 years ago

Calvin-L commented 4 years ago

Some of the recent Travis build failures indicate failed synthesis jobs with no further explanation. It appears that the synthesis jobs are segfaulting on Linux. In a Fedora 32 virtual machine, cozy examples/lsort.ds exhibits the segfault consistently. From dmesg:

[ 4969.897438] cozy[5274]: segfault at 0 ip 00007f8f339a3830 sp 00007ffdd6047460 error 4 in libffi.so.6.0.2[7f8f3399f000+5000]
[ 4969.897447] Code: ff ff 48 89 c7 48 8d 04 c1 48 8b 4d 88 01 f7 4a 89 04 e9 89 7d 84 49 83 c5 01 4c 39 6d 90 74 69 66 2e 0f 1f 84 00 00 00 00 00 <4b> 8b 3c ec 31 d2 4c 89 f1 4c 89 fe 4c 89 45 98 e8 eb f5 ff ff 4c

The dmesg output indicates that this is a null pointer dereference (segfault at 0). More info about segfault error codes.

After a great deal of work, I was able to obtain a stacktrace:

#0  0x00007fffea605830 in ffi_closure_unix64_inner () from /lib64/libffi.so.6
#1  0x00007fffea605c58 in ffi_closure_unix64 () from /lib64/libffi.so.6
#2  0x00007fffe87e9677 in Z3_model_get_func_interp ()
#3  0x00007fffea605af0 in ffi_call_unix64 () from /lib64/libffi.so.6
#4  0x00007fffea6052ab in ffi_call () from /lib64/libffi.so.6
#5  0x00007fffea38e4bc in _ctypes_callproc.cold ()
#6  0x00007fffea397fd7 in PyCFuncPtr_call ()
#7  0x00007ffff7bc3823 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0
#8  0x00007ffff7bc0948 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#9  0x00007ffff7bba231 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#10 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#11 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#12 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#13 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#14 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#15 0x00007ffff7bfff80 in call_method () from /lib64/libpython3.8.so.1.0
#16 0x00007ffff7bffd7c in slot_mp_subscript () from /lib64/libpython3.8.so.1.0
#17 0x00007ffff7bbbf1c in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#18 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#19 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#20 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#21 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#22 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#23 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#24 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#25 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#26 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#27 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#28 0x00007ffff7bd8081 in gen_send_ex () from /lib64/libpython3.8.so.1.0
#29 0x00007ffff7c263c5 in chain_next () from /lib64/libpython3.8.so.1.0
#30 0x00007ffff7bbbd4b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#31 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#32 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#33 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#34 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#35 0x00007ffff7bd17ec in method_vectorcall () from /lib64/libpython3.8.so.1.0
#36 0x00007ffff7bd1a90 in PyVectorcall_Call () from /lib64/libpython3.8.so.1.0
#37 0x00007ffff7bbd736 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#38 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#39 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#40 0x00007ffff7bba231 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#41 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#42 0x00007ffff7bd1603 in method_vectorcall () from /lib64/libpython3.8.so.1.0
#43 0x00007ffff7bbc60a in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#44 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#45 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#46 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#47 0x00007ffff7bc3247 in _PyObject_FastCallDict () from /lib64/libpython3.8.so.1.0
#48 0x00007ffff7bcf0bf in slot_tp_init () from /lib64/libpython3.8.so.1.0
#49 0x00007ffff7bc3afb in type_call () from /lib64/libpython3.8.so.1.0
#50 0x00007ffff7bc3823 in _PyObject_MakeTpCall () from /lib64/libpython3.8.so.1.0
#51 0x00007ffff7bc036c in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#52 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#53 0x00007ffff7bc04d7 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#54 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#55 0x00007ffff7bc04d7 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#56 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#57 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#58 0x00007ffff7bc92b8 in function_code_fastcall () from /lib64/libpython3.8.so.1.0
#59 0x00007ffff7bbbb3b in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#60 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#61 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#62 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#63 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#64 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#65 0x00007ffff7bbc60a in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#66 0x00007ffff7bba8d5 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#67 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#68 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#69 0x00007ffff7bba231 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#70 0x00007ffff7c351c9 in PyEval_EvalCodeEx () from /lib64/libpython3.8.so.1.0
#71 0x00007ffff7c3518b in PyEval_EvalCode () from /lib64/libpython3.8.so.1.0
#67 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () f
#72 0x00007ffff7c38ddc in builtin_exec () from /lib64/libpython3.8.so.1.0
#73 0x00007ffff7bc956b in cfunction_vectorcall_FASTCALL () from /lib64/libpython3.8.so.1.0
#74 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#75 0x00007ffff7bba231 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#76 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#77 0x00007ffff7bbb744 in _PyEval_EvalFrameDefault () from /lib64/libpython3.8.so.1.0
#78 0x00007ffff7bba231 in _PyEval_EvalCodeWithName () from /lib64/libpython3.8.so.1.0
#79 0x00007ffff7bc8f59 in _PyFunction_Vectorcall () from /lib64/libpython3.8.so.1.0
#80 0x00007ffff7bd1a90 in PyVectorcall_Call () from /lib64/libpython3.8.so.1.0
#81 0x00007ffff7c52b90 in pymain_run_module () from /lib64/libpython3.8.so.1.0
#82 0x00007ffff7c51f24 in Py_RunMain () from /lib64/libpython3.8.so.1.0
#83 0x00007ffff7c28aa9 in Py_BytesMain () from /lib64/libpython3.8.so.1.0
#84 0x00007ffff7e0e042 in __libc_start_main () from /lib64/libc.so.6
#85 0x000055555555507e in _start ()

It looks like Z3 is the culprit (Z3_model_get_func_interp), but experience has shown me that it is much more likely to be Cozy's fault somehow.

Notes:

Calvin-L commented 4 years ago

Looks like the error actually happens while calling the (no-op) Z3 error callback:

Screen Shot 2020-07-03 at 2 44 11 PM

In the screenshot, the highlighted line is the next instruction to execute after api::context::context::set_error_code(...) (which is being called) returns. That procedure tail-calls api::context::invoke_error_handler(...) which tail-calls the m_error_handler function pointer which I suppose happens to be a libffi call. That makes sense since in the Z3 Python code:


def z3_error_handler(c, e):
    # Do nothing error handler, just avoid exit(0)
    # The wrappers in z3core.py will raise a Z3Exception if an error is detected
    return

class Context:
    # ...
    def __init__(self, *args, **kws):
        # ...
        self.eh = Z3_set_error_handler(self.ctx, z3_error_handler)
        # ...
Calvin-L commented 4 years ago

In Cozy, the offending line is:

https://github.com/CozySynthesizer/cozy/blob/60b8ab0ccafbeeb89c3597a598b1166f46a6f5de/cozy/solver.py#L1168

But the error isn't due to misuse of model or f; it happens when Z3 discovers that f has no binding in model. In this case model[f] is supposed to return None, but something goes wrong when Z3 invokes its error callback.

Calvin-L commented 4 years ago

Current hypothesis: the segfaults happen nondeterministically for Z3 "Context" objects that are passed from the Cozy parent process to the multiprocessing jobs. I don't fully understand why yet. However, there is at least one way that this happens:

Calvin-L commented 4 years ago

No, that doesn't seem to fix the problem. New hypothesis: libffi 3.1 is to blame. My mac (where Cozy works) runs libffi 3.3.

Calvin-L commented 4 years ago

No, that doesn't seem to fix the problem. I have no new hypothesis, but I do have new information. The Cozy line I mentioned earlier isn't the only place that can segfault: I am occasionally seeing one in exp_wf, as the procedure returns and the Z3 context gets deallocated. That could indicate any number of problems (double-free, memory corruption, ...), none of which are good.

Calvin-L commented 4 years ago

This problem is beyond my ability and patience to debug. Since it is due to memory corruption, the actual bug is probably very far from where the process eventually crashes.

Ultimately I think this is due to some subtlety of fork() that Cozy or its dependencies does not handle properly. #119 seems to fix this problem by moving away from fork(), so I am closing this issue.