Failing assertion (probably related to the use of QF_BV)

martin-neuhaeusser commented 8 years ago

In the attached log, the debug version of Z3 fails with an assertion violation in the SAT solver's clause handling. The release version of Z3 works (i.e. hides the error without a seg fault), but valgrind reports an out-of-memory error and aborts the execution.

This issue arises only if the Z3 solver is configured for QF_BV. If we feed it the same problem and configure it for QF_UFBV, the problem does not occur.

z3_assertion.zip

wintersteiger commented 8 years ago

I'm afraid I can't reproduce this problem. Could you confirm that it still exists? Also, which platform do you get this segfault on?

martin-neuhaeusser commented 8 years ago

I tried on the latest master (commit-id 67e49b4ad), and the problem persists. My platform is Debian 8.4 amd64.

With a debug build of Z3, I get the following:

marneu@obelix:~/verifier$ z3 --log z3_assertion.log 
ASSERTION VIOLATION
File: ../src/util/vector.h
Line: 238
idx < size()
(C)ontinue, (A)bort, (S)top, (T)hrow exception, Invoke (G)DB
c
Segmentation fault
marneu@obelix:~/verifier$

NikolajBjorner commented 8 years ago

Note: It doesn't repro on windows.

martin-neuhaeusser commented 8 years ago

Actually, we have another segfault in our benchmarks, which I guess is related to the same problem. The corresponding log file is larger, but replaying it with Z3 from the current https://github.com/wintersteiger/z3/tree/new-ml-api branch (commit 140f0bb79) produces a segfault on my machine. Here is the log: segfault.zip

martin-neuhaeusser commented 8 years ago

@NikolajBjorner: I cannot reproduce both issues on Windows, either.

NikolajBjorner commented 8 years ago

If you have a stack trace, it may be possible to identify the problem without a repro.

martin-neuhaeusser commented 8 years ago

The stack trace of gdb (for the segfault.log example) looks as follows:

Program received signal SIGSEGV, Segmentation fault.
0x000000000112206c in sat::clause_set::erase() ()
(gdb) where
#0  0x000000000112206c in sat::clause_set::erase() ()
#1  0x000000000110f754 in sat::simplifier::subsume() ()
#2  0x0000000001110288 in sat::simplifier::operator()(bool) ()
#3  0x0000000001100a65 in sat::solver::simplify_problem() [clone .part.155] ()
#4  0x00000000011033b8 in sat::solver::check(unsigned int, sat::literal const*, double const*, double) ()
#5  0x000000000053ed5e in inc_sat_solver::check_sat(unsigned int, expr* const*) ()
#6  0x0000000000d22f42 in combined_solver::check_sat(unsigned int, expr* const*) ()
#7  0x00000000004807d9 in Z3_solver_check ()
#8  0x000000000042d0fe in z3_replayer::imp::parse() ()
#9  0x00000000004208cd in solve(char const*, std::istream&) [clone .isra.3] ()
#10 0x0000000000420a4d in replay_z3_log(char const*) ()
#11 0x00000000004112f3 in main ()

NikolajBjorner commented 8 years ago

Status: Debug mode will now assert in a different place and bypass the segmentation fault in that location. The bug is timer dependent, so may be dependent on environment and possibly not on OS. I got it reproed on a mac, but not on my main dev machine (Win10). Unfortunately, I don't have a good debugging environment on the mac set up. It's possible that the state isn't clean in one of the cancel scenarios (when running the log in verbose mode = 1 you will see a number of cancellation events, so either internal or external timeouts are used).

wintersteiger commented 8 years ago

Does this problem still exist? Was this perhaps another symptom of the x64 clause allocator bug (see #436 and fix in #722)?

martin-neuhaeusser commented 8 years ago

I just checked with the current Z3 master; the segmentation faults still occur (more or less non-deterministically) when we operate with timeouts and use the QF_BV theory. The same instances work flawlessly, if we configure Z3 to use a "more complex" theory combination such as, e.g. QF_UFBV.

As it seems from our tests, it only occurs (a) in QF_BV (b) with heavy use of (very short) timeouts (10 - 100ms) (c) if the solver is used incrementally (but I am not 100% sure about that).

I will try to attach a log-file that sometimes produces a segmentation fault as soon as I have the corresponding built environment set up.

wintersteiger commented 8 years ago

That's good to know, thanks for checking again! I'll take a closer look later today.

martin-neuhaeusser commented 8 years ago

Here are some log-files (taken with 4.5.1) that sometimes reproduce the segmentation faults. Except for segfault_arctan_Pade which uses QF_FP, all of them use QF_BV; there are more instances of QF_FP which segfault, too, but their log-files are huge...

The logs have been created on Mac OS X. It takes several attempts to reproduce a segmentation fault with the log-files. segfaults.zip

wintersteiger commented 8 years ago

Alright, I found some issues with the SAT solver when it's used incrementally and when it's interrupted/cancelled. This was still with the old log file, I haven't look at the latest ones yet.

@NikolajBjorner : could you take a look at the two fixes above (f2803ff, f5124ec)? Either one of those two fixes fixes the problem for me. Upon interrupting, I think m_simplifier.free_memory() may be a bit too aggressive, perhaps m_simplifier.reset_todo() (the fixed one) would be better for performance. It may be a good idea to go over the sat::simplifier and check that each of the various todos is cleaned up correctly, I'm fairly sure my fixed reset_todo() is just the beginning.

(to avoid confusion: the fixes are in a new branch on my fork, not in master.)

NikolajBjorner commented 8 years ago

To me looks like root cause is that m_sub_todo and m_sub_bin_todo aren't cleared in the interrupt scenario. A way to enforce they are clean is to clear them before registering clauses.

  void simplifier::operator()(bool learned) {

    ...

    m_sub_todo.reset();
    m_sub_bin_todo.reset();
    s.m_cleaner(true);
    m_last_sub_trail_sz = s.m_trail.size();

wintersteiger commented 8 years ago

Yes, the fact that some data structures aren't cleaned up is definitely a problem, but so far I haven't decided what should be resumable/interruptable and what we can just clear on every call to the simplifier. I'll take a closer look to see whether what the implications for the performance are.

wintersteiger commented 8 years ago

@martin-neuhaeusser: Can you replay segfault_arctan_Pade.log? For me it reports 3 errors and then it crashes and the reason is that an attempt is made to access the 1st argument of an roundNearestTiesToEven term (which has no arguments), so the replayer reports an index-out-of-bound and later crashes because the return value is a null-pointer in those cases. Any chance that's a bug on your end or did I break something in the meantime?

martin-neuhaeusser commented 8 years ago

I checked the arctan_Pade example (which is a C file from the floating-point category of SVCOMP) again with z3 as in tag 4.5.0. On several identical runs, Z3 crashes with different errors:

First run:

Fatal error: exception Z3.Error("Argument (let ((a!1 (not (= ((_ extract 51 0) (bv_wrap nondet-var-901671338-1-2$0))\n                   #x0000000000000)))\n      (a!4 (not (= ((_ extract 62 52) (bv_wrap nondet-var-901671338-1-2$0))\n                   #b01111111111))))\n(let ((a!2 (and (= ((_ extract 62 52) (bv_wrap nondet-var-901671338-1-2$0))\n                   #b11111111111)\n                a!1))\n      (a!5 (and (bvule #b01111111111\n                       ((_ extract 62 52) (bv_wrap nondet-var-901671338-1-2$0)))\n                a!4))\n      (a!6 (and (= ((_ extract 62 52) (bv_wrap nondet-var-901671338-1-2$0))\n                   #b01111111111)\n                a!1)))\n(let ((a!3 (ite a!2\n                (= ((_ extract 63 63) (bv_wrap nondet-var-901671338-1-2$0)) #b1)\n                (= ((_ extract 63 63) (bv_wrap nondet-var-901671338-1-2$0)) #b0))))\n  (and (not a!2) (not a!3) (or a!5 a!6))))) at position 1 does not match declaration (declare-fun bvmul ((_ BitVec 10) (_ BitVec 10)) (_ BitVec 10))")

Second run: direct segmentation fault (see attached log - arctan_example.zip) When I replay the log, it reports index-out-of-bounds errors, as you describe. The same holds true for the former log file that I attached in my previous post. When I recorded it, the application simply crashed without any error indication.

Third run:

Fatal error: exception Z3.Error("Wrong number of arguments (5) passed to function (declare-fun ite (Bool (_ BitVec 6) (_ BitVec 6)) (_ BitVec 6))")

Fourth run:

Fatal error: exception Z3.Error("Wrong number of arguments (5) passed to function (declare-fun ite (Bool (_ BitVec 10) (_ BitVec 10)) (_ BitVec 10))")

Fifth run:

Fatal error: exception Z3.Error("Wrong number of arguments (5) passed to function (declare-fun ite (Bool (_ BitVec 11) (_ BitVec 11)) (_ BitVec 11))")

Sixth run:

Fatal error: exception Z3.Error("Wrong number of arguments (5) passed to function (declare-fun ite (Bool (_ BitVec 3) (_ BitVec 3)) (_ BitVec 3))")

Given that there are so many different errors on identical runs, I presume (but am not sure, of course) this is caused by a bug in Z3. Another (better?) indication in that direction is that the said example succeeds with Z3 and QF_FP when we use other verification engines that do not make use of timeouts.

martin-neuhaeusser commented 8 years ago

@wintersteiger I am re-running benchmarks with your gh570-att-1 branch. Many (but not all) segmentation faults are gone. I'll try to produce some log files of the remaining crashes tomorrow.

wintersteiger commented 8 years ago

Fantastic, thanks for the tests! I'm fairly confident that the FP issues is mostly unrelated (probably a reference counting problem in theory_fpa going by the non-deterministic behaviour).

I've cleaned up the SAT simplifier/solver a little bit more and I'm currently running experiments to see whether it has a significant impact on the performance, but so far it looks pretty good (will push the fixes once the experiments are done).

wintersteiger commented 8 years ago

I fixed another couple of interruption cleanup issues in the SAT solver and I'm fairly confident that I haven't introduced any new ones. Performance-wise the datapoints have moved around, but on average it's exactly the same (avg speedup 1.023 on QF_BV). It's currently in the gh570-att-2 branch on my fork. The only issue that remains for me to solve is the weird FP log that the replayer doesn't like.

wintersteiger commented 8 years ago

I'm one big closer: segfault_arctan_Pade_01.log contained multiple calls to Z3_fpa_get_numeral_*_uint64(..., n) with n==0 and the API didn't check for those null pointers. It does now, but that means it throws "invalid argument" exceptions.

martin-neuhaeusser commented 8 years ago

The check for NULL pointers (to tackle the QF_FP problem) does not avoid the crashes; in particular, we still get segmentation faults and not (as could have been expected) an "invalid argument" exception. Looking at our code, we use only the OCaml API; and the stub functions for the Z3_fpa_getnumeral*_{u}int64 functions always supply a valid pointer to a stack-allocated {u}int64. So I suspect that the NULL values that you see are only a consequence of something else that has gone wrong, before.

Another observation hinting in that direction is that we still crash with exceptions that complain about varying bluntly incorrect arguments that are claimed to be supplied to standard operators such as, e.g. ITE.

martin-neuhaeusser commented 8 years ago

The good news is: The segfaults on our QF_BV problems with timeouts have disappeared in our benchmarks on your gh570-att-2 branch :-)

wintersteiger commented 8 years ago

Sweet! That's at least one problem sorted then!

The FP crashes are still giving me headaches; so far I wasn't able to reproduce any of your errors, and it's in no way non-deterministic for me. The log file replays always have the same behavior. Just in case, would there a way for you to share a binary (non-publicly if necessary), so I can replay that myself?

martin-neuhaeusser commented 8 years ago

@wintersteiger I will have a look into Z3's OCaml bindings - maybe something is wrong in dealing with their {u}int64 handling: OCaml only has signed 63 bit integers that might require some special handling in the stubs. The 64th bit is used as a flag of OCaml's GC, in case you wonder ;-)

Ultimately, sharing a binary non-publicly is possible, I guess. However I have to check back with Siemens before I can supply it.

wintersteiger commented 8 years ago

Good point, the OCaml ints could indeed be an overlooked problem; I never did any checks for them being in range, etc.

Binary would be great, but of course, only if it's not sensitive.

martin-neuhaeusser commented 7 years ago

I ran our floating-point examples with a debug build of Z3 (the current gh570-att-2 branch) and obtain assertion violations from Z3 in some instances:

ASSERTION VIOLATION
File: ../src/ast/rewriter/rewriter.h
Line: 235
frame_stack().empty() || frame_stack().back().m_state != PROCESS_CHILDREN || this->is_child_of_top_frame(old_t)
(C)ontinue, (A)bort, (S)top, (T)hrow exception, Invoke (G)DB
T
Fatal error: exception Z3.Error("assertion violation")
Raised by primitive operation at file "../src/api/ml/z3.ml" (inlined), line 1735, characters 24-53

The corresponding stack trace is the following:

#0  0x00007f198b31cb89 in __libc_waitpid (pid=22009, stat_loc=stat_loc@entry=0x7ffff6ad9dc0, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00007f198b2a412b in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f198cdd10f3 in invoke_gdb () at ../src/util/debug.cpp:100
#3  0x00007f198c721816 in rewriter_tpl<fpa2bv_rewriter_cfg>::not_rewriting (this=0x3306178) at ../src/ast/rewriter/rewriter_def.h:517
#4  0x00007f198c7226fd in rewriter_tpl<fpa2bv_rewriter_cfg>::main_loop<false> (this=0x3306178, t=0x320f6a8, result=..., result_pr=...)
    at ../src/ast/rewriter/rewriter_def.h:602
#5  0x00007f198c722088 in rewriter_tpl<fpa2bv_rewriter_cfg>::operator() (this=0x3306178, t=0x320f6a8, result=..., result_pr=...)
    at ../src/ast/rewriter/rewriter_def.h:688
#6  0x00007f198c5f17c1 in rewriter_tpl<fpa2bv_rewriter_cfg>::operator() (this=0x3306178, t=0x320f6a8, result=...)
    at ../src/ast/rewriter/rewriter.h:348
#7  0x00007f198c5e8ed7 in smt::theory_fpa::convert_atom (this=0x3305ed8, e=0x320f6a8) at ../src/smt/theory_fpa.cpp:298
#8  0x00007f198c5eaba2 in smt::theory_fpa::internalize_atom (this=0x3305ed8, atom=0x320f6a8, gate_ctx=true) at ../src/smt/theory_fpa.cpp:439
#9  0x00007f198c2d452f in smt::context::internalize_theory_atom (this=0x3212468, n=0x320f6a8, gate_ctx=true)
    at ../src/smt/smt_internalizer.cpp:462
#10 0x00007f198c2d3b9a in smt::context::internalize_formula (this=0x3212468, n=0x320f6a8, gate_ctx=true)
    at ../src/smt/smt_internalizer.cpp:405
#11 0x00007f198c2d3106 in smt::context::internalize (this=0x3212468, n=0x320f6a8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:329
#12 0x00007f198c2d2543 in smt::context::assert_default (this=0x3212468, n=0x320f6a8, pr=0x0) at ../src/smt/smt_internalizer.cpp:273
#13 0x00007f198c2d2497 in smt::context::internalize_assertion (this=0x3212468, n=0x320f6a8, pr=0x0, generation=0)
    at ../src/smt/smt_internalizer.cpp:268
#14 0x00007f198c493193 in smt::context::internalize_assertions (this=0x3212468) at ../src/smt/smt_context.cpp:2965
#15 0x00007f198c49248d in smt::context::push (this=0x3212468) at ../src/smt/smt_context.cpp:2871
#16 0x00007f198c300539 in smt::kernel::imp::push (this=0x3212468) at ../src/smt/smt_kernel.cpp:90
#17 0x00007f198c2ff7f3 in smt::kernel::push (this=0x445e0e8) at ../src/smt/smt_kernel.cpp:245
#18 0x00007f198c2ec7c2 in smt::solver::push_core (this=0x445de28) at ../src/smt/smt_solver.cpp:85
#19 0x00007f198c85c3ff in solver_na2as::push (this=0x445de28) at ../src/solver/solver_na2as.cpp:82
#20 0x00007f198c85dc4e in combined_solver::push (this=0x331ecb8) at ../src/solver/combined_solver.cpp:185
#21 0x00007f198be76287 in Z3_solver_push (c=0x311b328, s=0x312e628) at ../src/api/api_solver.cpp:206
#22 0x0000000000fe746c in n_solver_push (a0=139747680059560, a1=139747680613944) at ../src/api/ml/z3native_stubs.c:13358
#23 0x00000000009cf4b5 in camlZ3Solver__push_7617 () at ../src/api/ml/z3.ml:1735
#24 0x0000000000a5942d in camlInterface__push_4449 () at solver/interface.ml:775
#25 0x00000000008eb53e in camlTreeIC3___check_incremental_854 () at treeic3/treeIC3.ml:215
#26 0x00000000008eb8ca in camlTreeIC3__check_and_pop_activation_literals_1029 () at treeic3/treeIC3.ml:316
#27 0x00000000008ee35d in camlTreeIC3__check_relative_inductiveness_incremental_2424 () at treeic3/treeIC3.ml:686
#28 0x00000000008f0ad0 in camlTreeIC3__check_literal_3457 () at treeic3/treeIC3.ml:965
#29 0x0000000000a758d0 in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#30 0x00000000008f00c6 in camlTreeIC3__anon$2dfn$5btreeic3$2ftreeIC3$2eml$3a828$2c8$2d$2d7811$5d_3118 () at globals/hashedSet.ml:350
#31 0x00000000008f2365 in camlTreeIC3__block_4128 () at treeic3/treeIC3.ml:1128
#32 0x00000000008f554a in camlTreeIC3__outer_loop_5986 () at treeic3/treeIC3.ml:1374
#33 0x00000000008f6271 in camlTreeIC3__check_6491 () at treeic3/treeIC3.ml:1798
#34 0x00000000008e53d8 in camlVerifier__entry () at engines/consoleFrontend.ml:39
#35 0x0000000000883d29 in caml_program ()
#36 0x00000000010115ae in caml_start_program ()
#37 0x00000000010118e5 in caml_main (argv=0x7ffff6adb288) at startup.c:145
---Type <return> to continue, or q <return> to quit---
#38 0x0000000000882b0c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

I also attached the corresponding log file, but it replays without reproducing the assertion violation. assertion_violation.zip

wintersteiger commented 7 years ago

Thanks for the stacktrace and the new log, this does get me a bit closer!

In the logfile, multiple contexts are created and destroyed (and it ends in Z3_del_context for one of them). Is there any kind of concurrency involved, or are you just using multiple contexts to keep things separate?

wintersteiger commented 7 years ago

Subtle detail: the log file ends in

Z3_del_context(0x311b328);

while the stack trace tells us the segfault happens during

Z3_solver_push (c=0x311b328, s=0x312e628)

The context for both operations is 0x311b328. Could it be that the context is deleted too early or that the garbage collector interferes when it shouldn't?

martin-neuhaeusser commented 7 years ago

@wintersteiger Our verifier does not use multiple threads - we create different contexts sequentially during a typical verification run (e.g. one during simplification, one during model minimization, different context for, e.g. generalization and inductivity checks).

Regarding the order in which contexts are deleted: I checked with a debug output in Z3's OCaml bindings and if I interpret the output correctly, the context is freed only after I quit GDB and let the exception terminate the OCaml program; when doing so, OCaml's GC cleans up the context.

While re-running the example, I got a second assertion violation about non-empty frame stacks:

------------------- Tree-IC3 -------------------
ASSERTION VIOLATION
File: ../src/ast/rewriter/rewriter_def.h
Line: 517
frame_stack().empty()
(C)ontinue, (A)bort, (S)top, (T)hrow exception, Invoke (G)DB
T
try_to_delete_context: Deleting context 0x2d4cf20(0x2d39d68) with cnt=0.
Fatal error: exception Z3.Error("assertion violation")
Raised by primitive operation at file "../src/api/ml/z3.ml" (inlined), line 1735, characters 24-53
Called from file "solver/z3/z3Solver.ml", line 1538, characters 17-36
Called from file "solver/interface.ml", line 775, characters 6-45
Called from file "treeic3/treeIC3.ml", line 215, characters 17-35
Called from file "treeic3/treeIC3.ml", line 316, characters 19-72
Called from file "treeic3/treeIC3.ml", line 686, characters 17-74
Called from file "treeic3/treeIC3.ml", line 965, characters 18-95
Called from file "globals/hashedSet.ml", line 350, characters 42-61
Called from file "globals/hashedSet.ml" (inlined), line 350, characters 47-60
Called from file "treeic3/treeIC3.ml", line 977, characters 10-69
Called from file "treeic3/treeIC3.ml", line 1128, characters 18-74
Called from file "treeic3/treeIC3.ml" (inlined), line 1374, characters 4-77
Called from file "treeic3/treeIC3.ml", line 1723, characters 26-100
Called from file "treeic3/treeIC3.ml", line 1798, characters 8-98
Called from file "engines/consoleFrontend.ml" (inlined), line 39, characters 19-35
Called from file "verifier.ml", line 921, characters 6-140
Re-raised at file "verifier.ml", line 921, characters 6-140

Again, the context is freed only after the assertion failure has occurred (the line reading "try_to_delete_context" is the log message I introduced for debugging purposes into Z3's OCaml bindings).

martin-neuhaeusser commented 7 years ago

Finally I got a log-file (invalid_ptr.zip) that reports an invalid pointer.

The corresponding stack-trace is:

(gdb) info stack
#0  0x00007f1acbcf7b89 in __libc_waitpid (pid=43772, stat_loc=stat_loc@entry=0x7fff85e4adf0, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00007f1acbc7f12b in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f1acd7ac0f3 in invoke_gdb () at ../src/util/debug.cpp:100
#3  0x00007f1acd0fc816 in rewriter_tpl<fpa2bv_rewriter_cfg>::not_rewriting (this=0x2e7d178) at ../src/ast/rewriter/rewriter_def.h:517
#4  0x00007f1acd0fd6fd in rewriter_tpl<fpa2bv_rewriter_cfg>::main_loop<false> (this=0x2e7d178, t=0x2d866a8, result=..., result_pr=...)
    at ../src/ast/rewriter/rewriter_def.h:602
#5  0x00007f1acd0fd088 in rewriter_tpl<fpa2bv_rewriter_cfg>::operator() (this=0x2e7d178, t=0x2d866a8, result=..., result_pr=...)
    at ../src/ast/rewriter/rewriter_def.h:688
#6  0x00007f1accfcc7c1 in rewriter_tpl<fpa2bv_rewriter_cfg>::operator() (this=0x2e7d178, t=0x2d866a8, result=...) at ../src/ast/rewriter/rewriter.h:348
#7  0x00007f1accfc3ed7 in smt::theory_fpa::convert_atom (this=0x2e7ced8, e=0x2d866a8) at ../src/smt/theory_fpa.cpp:298
#8  0x00007f1accfc5ba2 in smt::theory_fpa::internalize_atom (this=0x2e7ced8, atom=0x2d866a8, gate_ctx=true) at ../src/smt/theory_fpa.cpp:439
#9  0x00007f1acccaf52f in smt::context::internalize_theory_atom (this=0x2d89468, n=0x2d866a8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:462
#10 0x00007f1acccaeb9a in smt::context::internalize_formula (this=0x2d89468, n=0x2d866a8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:405
#11 0x00007f1acccae106 in smt::context::internalize (this=0x2d89468, n=0x2d866a8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:329
#12 0x00007f1acccad543 in smt::context::assert_default (this=0x2d89468, n=0x2d866a8, pr=0x0) at ../src/smt/smt_internalizer.cpp:273
#13 0x00007f1acccad497 in smt::context::internalize_assertion (this=0x2d89468, n=0x2d866a8, pr=0x0, generation=0) at ../src/smt/smt_internalizer.cpp:268
#14 0x00007f1acce6e193 in smt::context::internalize_assertions (this=0x2d89468) at ../src/smt/smt_context.cpp:2965
#15 0x00007f1acce6d48d in smt::context::push (this=0x2d89468) at ../src/smt/smt_context.cpp:2871
#16 0x00007f1acccdb539 in smt::kernel::imp::push (this=0x2d89468) at ../src/smt/smt_kernel.cpp:90
#17 0x00007f1acccda7f3 in smt::kernel::push (this=0x3fd50e8) at ../src/smt/smt_kernel.cpp:245
#18 0x00007f1acccc77c2 in smt::solver::push_core (this=0x3fd4e28) at ../src/smt/smt_solver.cpp:85
#19 0x00007f1acd2373ff in solver_na2as::push (this=0x3fd4e28) at ../src/solver/solver_na2as.cpp:82
#20 0x00007f1acd238c4e in combined_solver::push (this=0x2e95cb8) at ../src/solver/combined_solver.cpp:185
#21 0x00007f1acc851287 in Z3_solver_push (c=0x2c92328, s=0x2ca5628) at ../src/api/api_solver.cpp:206
#22 0x0000000000ea584c in n_solver_push (a0=139753059035696, a1=139753059617008) at ../src/api/ml/z3native_stubs.c:13358
#23 0x0000000000a65830 in camlZ3__push_10543 () at ../src/api/ml/z3.ml:1735
#24 0x000000000092d54d in camlInterface__push_4449 () at solver/interface.ml:775
#25 0x00000000008595fe in camlTreeIC3___check_incremental_854 () at treeic3/treeIC3.ml:215
#26 0x000000000085998a in camlTreeIC3__check_and_pop_activation_literals_1029 () at treeic3/treeIC3.ml:316
#27 0x000000000085bb37 in camlTreeIC3__check_relative_inductiveness_incremental_2424 () at treeic3/treeIC3.ml:686
#28 0x000000000085d680 in camlTreeIC3__check_literal_3457 () at treeic3/treeIC3.ml:965
#29 0x000000000093e880 in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#30 0x000000000093e86b in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#31 0x000000000085d012 in camlTreeIC3__anon$2dfn$5btreeic3$2ftreeIC3$2eml$3a828$2c8$2d$2d7811$5d_3118 () at treeic3/treeIC3.ml:977
#32 0x000000000085e971 in camlTreeIC3__block_4128 () at treeic3/treeIC3.ml:1128
#33 0x0000000000861441 in camlTreeIC3__outer_loop_5986 () at treeic3/treeIC3.ml:1723
#34 0x0000000000861ee1 in camlTreeIC3__check_6491 () at treeic3/treeIC3.ml:1798
#35 0x000000000082a8ec in camlVerifier__execute_20191 () at engines/consoleFrontend.ml:39
#36 0x0000000000855b94 in camlVerifier__entry () at verifier.ml:921
#37 0x0000000000827399 in caml_program ()
#38 0x0000000000ecf98e in caml_start_program ()
#39 0x0000000000ecfcc5 in caml_main (argv=0x7fff85e4c3e8) at startup.c:145
#40 0x000000000082617c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

NikolajBjorner commented 7 years ago

The log file seems to be corrupted somehow. It contains a bunch of zeroed bytes.

wintersteiger commented 7 years ago

@NikolajBjorner: not sure what you're talking about, how do those zeroed bytes manifest themselves?

I just fixed a problem in theory_fpa that could well have been the source of the current issues. It uses two rewriters and when they were interrupted (e.g. timeout) then they would not clean up after themselves properly. My current fix is just local to theory_fpa, but we may want to go over the rewriter_tpl at some point and fix those problems more generally. (Now in master and in the rebased gh570-att-2.)

wintersteiger commented 7 years ago

Oh, I didn't see the latest log file yet. Could you confirm that line 18050 of the log file contains "CR"? That indicates something has gone wrong while writing the log; either parallelism or some operation doesn't report its function number correctly in log files.

martin-neuhaeusser commented 7 years ago

Maybe I made a mistake with the last log; could be that accidentally there was another instance of the tool running in the background. I try to produce a valid one...

martin-neuhaeusser commented 7 years ago

I tried to reproduce the log-file from my previous post, but things are different (and better!) with the latest version (bf2ceacd) of your gh570-att-2 branch. Your fixes in theory_fpa.cpp have reduced the number of exceptions; only two are left in our set of floating-point benchmarks. They seem to be (a bit) different:

Type-checking.
Inlining.
Model simplification.
Building memory model.
Building CFA from IVL sources.
Applying large-block encoding.
Using SMT theory QF_FP.
------------------- Tree-IC3 -------------------
ASSERTION VIOLATION
File: ../src/smt/theory_fpa.cpp
Line: 747
UNREACHABLE CODE WAS REACHED.

And this time, I (hope to) have a log (unreachable_code.zip) that reproduces the situation.

Here is the stack trace:

#0  0x00007f364c0b9b89 in __libc_waitpid (pid=28458, stat_loc=stat_loc@entry=0x7ffd7545bab0, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:40
#1  0x00007f364c04112b in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f364db6e637 in invoke_gdb () at ../src/util/debug.cpp:100
#3  0x00007f364d38ae2b in smt::theory_fpa::relevant_eh (this=0x2779298, n=0x275d918) at ../src/smt/theory_fpa.cpp:747
#4  0x00007f364d2289e4 in smt::context::relevant_eh (this=0x27ee398, n=0x275d918) at ../src/smt/smt_context.cpp:1657
#5  0x00007f364d2436a7 in smt::relevancy_propagator_imp::set_relevant (this=0x2767d08, n=0x275d918) at ../src/smt/smt_relevancy.cpp:308
#6  0x00007f364d243734 in smt::relevancy_propagator_imp::mark_as_relevant (this=0x2767d08, n=0x275cab8) at ../src/smt/smt_relevancy.cpp:336
#7  0x00007f364d24380a in smt::relevancy_propagator_imp::propagate_relevant_app (this=0x2767d08, n=0x2783458)
    at ../src/smt/smt_relevancy.cpp:357
#8  0x00007f364d2444d5 in smt::relevancy_propagator_imp::propagate (this=0x2767d08) at ../src/smt/smt_relevancy.cpp:480
#9  0x00007f364d07bc85 in smt::context::mark_as_relevant (this=0x27ee398, n=0x27c5658) at ../src/smt/smt_context.h:1110
#10 0x00007f364d07bcba in smt::context::mark_as_relevant (this=0x27ee398, v=509) at ../src/smt/smt_context.h:1114
#11 0x00007f364d07bce7 in smt::context::mark_as_relevant (this=0x27ee398, l=...) at ../src/smt/smt_context.h:1116
#12 0x00007f364d3877dd in smt::theory_fpa::assert_cnstr (this=0x2779298, e=0x27c5658) at ../src/smt/theory_fpa.cpp:446
#13 0x00007f364d387e07 in smt::theory_fpa::internalize_atom (this=0x2779298, atom=0x2786aa8, gate_ctx=true) at ../src/smt/theory_fpa.cpp:487
#14 0x00007f364d07152f in smt::context::internalize_theory_atom (this=0x27ee398, n=0x2786aa8, gate_ctx=true)
    at ../src/smt/smt_internalizer.cpp:462
#15 0x00007f364d070b9a in smt::context::internalize_formula (this=0x27ee398, n=0x2786aa8, gate_ctx=true)
    at ../src/smt/smt_internalizer.cpp:405
#16 0x00007f364d070106 in smt::context::internalize (this=0x27ee398, n=0x2786aa8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:329
#17 0x00007f364d07063b in smt::context::internalize_formula (this=0x27ee398, n=0x27c89b8, gate_ctx=true)
    at ../src/smt/smt_internalizer.cpp:368
#18 0x00007f364d070106 in smt::context::internalize (this=0x27ee398, n=0x27c89b8, gate_ctx=true) at ../src/smt/smt_internalizer.cpp:329
#19 0x00007f364d06f543 in smt::context::assert_default (this=0x27ee398, n=0x27c89b8, pr=0x0) at ../src/smt/smt_internalizer.cpp:273
#20 0x00007f364d06f497 in smt::context::internalize_assertion (this=0x27ee398, n=0x27c89b8, pr=0x0, generation=0)
    at ../src/smt/smt_internalizer.cpp:268
#21 0x00007f364d230193 in smt::context::internalize_assertions (this=0x27ee398) at ../src/smt/smt_context.cpp:2965
#22 0x00007f364d231ef9 in smt::context::check (this=0x27ee398, num_assumptions=0, assumptions=0x0, reset_cancel=true)
    at ../src/smt/smt_context.cpp:3211
#23 0x00007f364d09d6b9 in smt::kernel::imp::check (this=0x27ee398, num_assumptions=0, assumptions=0x0) at ../src/smt/smt_kernel.cpp:111
#24 0x00007f364d09c975 in smt::kernel::check (this=0x27ee328, num_assumptions=0, assumptions=0x0) at ../src/smt/smt_kernel.cpp:276
#25 0x00007f364d089917 in smt::solver::check_sat_core (this=0x27ee068, num_assumptions=0, assumptions=0x0) at ../src/smt/smt_solver.cpp:94
#26 0x00007f364d5f950c in solver_na2as::check_sat (this=0x27ee068, num_assumptions=0, assumptions=0x0) at ../src/solver/solver_na2as.cpp:67
#27 0x00007f364d5fb22e in combined_solver::check_sat (this=0x27695e8, num_assumptions=0, assumptions=0x0)
    at ../src/solver/combined_solver.cpp:241
#28 0x00007f364cc13ea8 in _solver_check (c=0x2473578, s=0x2491898, num_assumptions=0, assumptions=0x0) at ../src/api/api_solver.cpp:296
#29 0x00007f364cc1406f in Z3_solver_check (c=0x2473578, s=0x2491898) at ../src/api/api_solver.cpp:311
#30 0x0000000000ea5efe in n_solver_check (a0=139871174423056, a1=139871174743968) at ../src/api/ml/z3native_stubs.c:13553
#31 0x0000000000a65bec in camlZ3__check_10680 () at ../src/api/ml/z3.ml:1759
#32 0x00000000008e100f in camlZ3Solver__check_7232 () at solver/z3/z3Solver.ml:1472
#33 0x000000000092d4ce in camlInterface__check_assumptions_4390 () at solver/interface.ml:766
#34 0x000000000085963c in camlTreeIC3___check_incremental_854 () at treeic3/treeIC3.ml:217
#35 0x000000000085998a in camlTreeIC3__check_and_pop_activation_literals_1029 () at treeic3/treeIC3.ml:316
#36 0x000000000085bb37 in camlTreeIC3__check_relative_inductiveness_incremental_2424 () at treeic3/treeIC3.ml:686
#37 0x000000000085d680 in camlTreeIC3__check_literal_3457 () at treeic3/treeIC3.ml:965
#38 0x000000000093e880 in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#39 0x000000000093e86b in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#40 0x000000000093e86b in camlHashedSet__fold_876 () at globals/hashedSet.ml:350
#41 0x000000000085d012 in camlTreeIC3__anon$2dfn$5btreeic3$2ftreeIC3$2eml$3a828$2c8$2d$2d7811$5d_3118 () at treeic3/treeIC3.ml:977
#42 0x000000000085e971 in camlTreeIC3__block_4128 () at treeic3/treeIC3.ml:1128
#43 0x0000000000861441 in camlTreeIC3__outer_loop_5986 () at treeic3/treeIC3.ml:1723
#44 0x0000000000861ee1 in camlTreeIC3__check_6491 () at treeic3/treeIC3.ml:1798
#45 0x000000000082a8ec in camlVerifier__execute_20191 () at engines/consoleFrontend.ml:39
#46 0x0000000000855b94 in camlVerifier__entry () at verifier.ml:921
#47 0x0000000000827399 in caml_program ()
#48 0x0000000000ecf98e in caml_start_program ()
#49 0x0000000000ecfcc5 in caml_main (argv=0x7ffd7545d728) at startup.c:145
#50 0x000000000082617c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

wintersteiger commented 7 years ago

Great, that looks like another problem fixed then. The behavior on the unreachable-code logfile is again non-deterministic, but I think I already know what the problem is (the internalizer is interrupted and doesn't clean up properly).

wintersteiger commented 7 years ago

Alright, it took a lot of source code reading, but finally I figured out that the unreachable assertion was simply a bit too conservative. This fix is already in master and I'll now also merge gh570-att-2 into master.

NikolajBjorner commented 7 years ago

seems several bugs were reported here and fixed. Perhaps we can close this?

Z3Prover / z3

Failing assertion (probably related to the use of QF_BV) #570