Closed Calvin-L closed 4 years ago
Looks like the error actually happens while calling the (no-op) Z3 error callback:
In the screenshot, the highlighted line is the next instruction to execute after api::context::context::set_error_code(...)
(which is being called) returns. That procedure tail-calls api::context::invoke_error_handler(...)
which tail-calls the m_error_handler
function pointer which I suppose happens to be a libffi call. That makes sense since in the Z3 Python code:
def z3_error_handler(c, e):
# Do nothing error handler, just avoid exit(0)
# The wrappers in z3core.py will raise a Z3Exception if an error is detected
return
class Context:
# ...
def __init__(self, *args, **kws):
# ...
self.eh = Z3_set_error_handler(self.ctx, z3_error_handler)
# ...
In Cozy, the offending line is:
But the error isn't due to misuse of model
or f
; it happens when Z3 discovers that f
has no binding in model
. In this case model[f]
is supposed to return None
, but something goes wrong when Z3 invokes its error callback.
Current hypothesis: the segfaults happen nondeterministically for Z3 "Context" objects that are passed from the Cozy parent process to the multiprocessing jobs. I don't fully understand why yet. However, there is at least one way that this happens:
solver_for_context
pulls Cozy-solvers out of a cache that is implicitly copied from parent to child.No, that doesn't seem to fix the problem. New hypothesis: libffi 3.1 is to blame. My mac (where Cozy works) runs libffi 3.3.
No, that doesn't seem to fix the problem. I have no new hypothesis, but I do have new information. The Cozy line I mentioned earlier isn't the only place that can segfault: I am occasionally seeing one in exp_wf
, as the procedure returns and the Z3 context gets deallocated. That could indicate any number of problems (double-free, memory corruption, ...), none of which are good.
This problem is beyond my ability and patience to debug. Since it is due to memory corruption, the actual bug is probably very far from where the process eventually crashes.
Ultimately I think this is due to some subtlety of fork()
that Cozy or its dependencies does not handle properly. #119 seems to fix this problem by moving away from fork()
, so I am closing this issue.
Some of the recent Travis build failures indicate failed synthesis jobs with no further explanation. It appears that the synthesis jobs are segfaulting on Linux. In a Fedora 32 virtual machine,
cozy examples/lsort.ds
exhibits the segfault consistently. Fromdmesg
:The
dmesg
output indicates that this is a null pointer dereference (segfault at 0
). More info about segfault error codes.After a great deal of work, I was able to obtain a stacktrace:
It looks like Z3 is the culprit (
Z3_model_get_func_interp
), but experience has shown me that it is much more likely to be Cozy's fault somehow.Notes: