Wrong answers and segmentation faults with try-for and qe-sat tactic

mtappler commented 9 years ago

This issue is actually an adaptation of a stackoverflow question (http://stackoverflow.com/questions/31864004/z3-try-for-causes-segmentation-faults) so I will copy some parts of this question.

I am using the Java-API to communicate with Z3 on Ubuntu Linux 14.04 for the development of a Scala application. Regarding Z3, I use the current version of the master-branch found on github (compiled on the 6th of August).

The basic structure of the formulas, for which I want to check satisfiability, is as follows:

(and (exists ((x1 S1) ... (xn Sn)) p(x1,...,xn)) 
     (not (exists ((x1 S1) ... (xm Sm)) q(x1,...,xm))))

or simply

p(x1,...,xn)

without quantification

where the Si may be Bool, an enumeration Sort, Real or Integer and p and q are formulas containing:

inequalities between sums of numeric values and constants
equalities between enumeration values and constants
boolean values and constants.
combinations thereof (conjunctions, disjunctions, negations) However, the formulas may either contain integers or reals but not both.

In order to check satisfiability of quantified formulas, I create a solver using the tactic:

(or-else (try-for smt x) (then qe smt))

The reason for this choice of solver is that some formulas are simple and therefore fast to check and do not require quantifier elimination, but the plain smt solver fails for some without quantifier elimination. Since it may take very long for the smt solver to fail, I use try-for with a timeout of x.

Now the problem is that depending on the value of x, this tactic produces segmentation faults after a few hundred satisfiability checks (the actual number varies). I have also observed that larger values tend to be safer, but this depends on the complexity of the formulas p and q. Currently, I am using values in the range from 70 to 200.

Another problem is that the tactic returns wrong results after a number of checks. I was able to find a formula for which the solver produced the wrong result regardless of the value of x, i.e. it returned satisfiable instead of unsatisfiable. However, it only produces wrong results if other queries had been made before, but it works correctly if only a single check is performed (see here: http://rise4fun.com/Z3/YslqXo).

As requested on stackoverflow, I provide interaction logs for both of these two issues under the following links:

Unfortunately, they are pretty large, as I have not yet been able to find a smaller example, which shows similar behaviour.

I do not know if this is helpful, but the crash report provided by the JVM lists C [libz3.so+0x4aec70] smt::mf::quantifier_analyzer::process_literal(expr*, bool)+0x40 as the problematic frame after a segmentation fault.

As I did some more experiments with tactics, I tried using the tactic qe-sat, but a solver built using this tactic produces different results than e.g. a solver built using (then qe smt). I did not identify yet, why this happens, because the computation history is again relevant. This means that If I take some formula, for which the results of both solvers differ and I create "fresh" solvers both will produce the same result, but after a number of satisfiability check they will produce different results. I observed this behaviour while creating symbolic execution trees and the tree structure was different, when qe-sat was used.

As before, I recorded interaction logs using com.microsoft.z3.Log.

http://www.student.tugraz.at/martin.tappler/qe_sat.log.zip
http://www.student.tugraz.at/martin.tappler/then_qe_smt.log.zip Both logs were created using the exact same input data but different tactics for creating solvers.

I hope the provided information is helpful, otherwise I can provide more information or again try to find a simpler example.

wintersteiger commented 9 years ago

Do you also see these problems without the (try-for smt x) in the tactic? The hard timeout on the SMT tactic could perhaps leave some parts of the memory in an invalid state which it picks up later after QE is finished. If that's the case, it will be hell to debug, but if the problem is just in the QE tactic we have at least an idea of where to start.

mtappler commented 9 years ago

Yes, I see the problems without the try-for as well. I also tried using two other slightly different tactics (with par-or and without or-else), which did not show these problems. To summarize: (or-else smt (then qe smt)) shows the problems but (then qe smt) and (par-or smt (then qe smt)) do not show the same problems.

NikolajBjorner commented 9 years ago

Would you be able to reproduce the bug using the "unstable" branch? We do fix bugs in the unstable branch and the APIs are not compatible, so I am unable to run the log files on the unstable branch.

mtappler commented 9 years ago

Yes, I just tried using the unstable branch and both the segmentation fault and wrong result problem occured again. I recorded new logs using this branch (without using try-for):

NikolajBjorner commented 9 years ago

I found (and fixed) a bug exposed by the "wrong_result" log. The seg_fault log takes an eternity to run. There is a reasonable chance this fix takes care of both bug scenarios.

mtappler commented 9 years ago

Thank you for your bug fix! It seems to have solved the segmentation fault problem as well. Using the tactic with try-for, it is possible to run the scenario corresponding to the seg_fault log to completion without producing a segmentation fault.

Furthermore, I just checked the behaviour of the qe-sat using the unstable branch. I also the same problems as described above for this tactic. Shall I open another issue or post new interaction logs here?

NikolajBjorner commented 9 years ago

It is almost for sure having the same kind of bug with parameters being reset, but in a completely different module. Is there a log I can use for the unstable branch?

mtappler commented 9 years ago

I just recorded the following two logs:

The first one uses the qe-sat tactic, while the second one uses (then qe smt) for all checks of formulas containing quantifiers. Both logs are recorded on the same input data, but qe-sat returns unsatisfiable more often than the other tactic (as far as I have analysed it).

NikolajBjorner commented 9 years ago

Thanks, it suggests one source of bugs in qe-sat. It didn' reset its state correctly between calls. THis has been fixed.

NikolajBjorner commented 9 years ago

Can I close the bug?

mtappler commented 9 years ago

Yes, now qe-sat produces the same results as (then qe smt). Thank you for fixing the bug!

Z3Prover / z3

Wrong answers and segmentation faults with try-for and qe-sat tactic #196