Z3Prover / z3

The Z3 Theorem Prover
Other
10.38k stars 1.48k forks source link

wasm build: satisfiable problem produces `unknown`, `Z3_solver_get_reason_unknown` returns empty string #5746

Closed bakkot closed 2 years ago

bakkot commented 2 years ago

I know the wasm build isn't fully supported yet, but I'm hoping someone can help here. I was hoping to start writing nice JS bindings for the wasm build along the lines of the Python bindings, but the first nontrivial problem I tried it on failed.

The problem is a standard sudoku. I'm loading and running it with


let conf = Module._Z3_mk_config();
let ctx = Module._Z3_mk_context(conf);

let solver = Module._Z3_mk_solver(ctx);
Module._Z3_solver_inc_ref(ctx, solver);
Module._Z3_solver_from_string(ctx, solver, stringToPointer(desc));

let checkResult = Module._Z3_solver_check(ctx, solver);
let reason = Module._Z3_solver_get_reason_unknown(ctx, solver);

which gives a checkResult of 0 (i.e. unknown), but the reason reported by Z3_solver_get_reason_unknown is just the empty string. Is there any way I can figure out what's going wrong here?

(It's not an issue with how I'm loading it, because I can remove a constraint from desc and it will then check as sat.)

I have a complete reproduction here. It already has the wasm build (and the command used to generate it) checked in, so you don't need to set emscripten.

NikolajBjorner commented 2 years ago

Thanks for preparing the repro. I tried it on my end but run into some linking issues that I would have to resolve first (likely as I am pulling libz3.a from the wasm build and using incompatible compiler or something along those lines). The issue would be resolved much easier by someone more apt at the wasm/js interface than I. One thing you could try (this is what I was going to do to poke around) is to set verbose configuration (Z3_set_global_param("verbose","10")) to get an idea if the solver actually does anything of use.

For real desperation mode: A wasm build with debug tracing enabled would furthermore allow tracing internally without depending on capability to set breakpoints. Z3_enable_trace("after_search"), Z3_enable_trace("asserted_formulas"), Z3_enable_trace("sat") should be a start.

Though, based on your description it seems the bug is really about calling conventions between the interfaces and therefore unrelated to z3 specifics.

bakkot commented 2 years ago

Here's output with "verbose" "10" as suggested:

(simplifier :num-exprs 303 :num-asts 551 :time 0.01 :before-memory 9.83 :after-memory 9.83)
(simplifier :num-exprs 1896 :num-asts 2252 :time 0.04 :before-memory 9.83 :after-memory 9.92)
(propagate-values :num-exprs 1646 :num-asts 1971 :time 0.15 :before-memory 9.90 :after-memory 9.91)
(ctx-simplify :num-steps 6500)
(ctx-simplify :num-exprs 1646 :num-asts 1971 :time 0.03 :before-memory 9.93 :after-memory 9.96)
(simplifier :num-exprs 1646 :num-asts 1971 :time 0.03 :before-memory 9.91 :after-memory 9.91)
(solve_eqs :num-exprs 1598 :num-asts 1971 :time 0.13 :before-memory 9.91 :after-memory 9.91)
(:num-elim-vars 24)
(elim-uncnstr :num-exprs 1598 :num-asts 1971 :time 0.00 :before-memory 9.91 :after-memory 9.91)
(simplifier :num-exprs 2016 :num-asts 2389 :time 0.07 :before-memory 9.91 :after-memory 9.92)
(ilp-model-finder-tactic start)
(ilp-model-finder-tactic done)
(pb-tactic start)
(pb-tactic done)
(bounded-tactic start)
(bounded-tactic done)
result: unknown
reason: ""

And then if I remove the (assert (<= c_8_8 9)) constraint, I get

(simplifier :num-exprs 302 :num-asts 550 :time 0.01 :before-memory 9.83 :after-memory 9.83)
(simplifier :num-exprs 1895 :num-asts 2251 :time 0.04 :before-memory 9.83 :after-memory 9.92)
(propagate-values :num-exprs 1645 :num-asts 1970 :time 0.14 :before-memory 9.90 :after-memory 9.91)
(ctx-simplify :num-steps 6494)
(ctx-simplify :num-exprs 1645 :num-asts 1970 :time 0.02 :before-memory 9.93 :after-memory 9.96)
(simplifier :num-exprs 1645 :num-asts 1970 :time 0.03 :before-memory 9.91 :after-memory 9.91)
(solve_eqs :num-exprs 1597 :num-asts 1970 :time 0.12 :before-memory 9.91 :after-memory 9.91)
(:num-elim-vars 24)
(elim-uncnstr :num-exprs 1597 :num-asts 1970 :time 0.00 :before-memory 9.91 :after-memory 9.91)
(simplifier :num-exprs 2015 :num-asts 2388 :time 0.06 :before-memory 9.91 :after-memory 9.92)
(ilp-model-finder-tactic start)
(ilp-model-finder-tactic done)
(pb-tactic start)
(pb-tactic done)
(bounded-tactic start)
(bounded-tactic done)
(smt.tactic start)
(smt.propagate-values)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.stats :restarts    :decisions  :clauses/bin    :simplify   :memory)
(smt.stats        :conflicts    :propagations   :lemmas  :deletions   )
(smt.stats    0      8     29      0   627/627     7    0    0   11.26)
(smt :num-exprs 0 :num-asts 2390 :time 0.54 :before-memory 9.92 :after-memory 10.13)
result: sat

It looks like it's just not running the smt.tactic part at all, in the first case? I wonder if this is just down to how it's choosing the tactics - is any of that logic based in part on whether it's running in a multithreaded environment, or anything to that effect?

If I run the C++ equivalent, on the full problem, I get

(simplifier :num-exprs 303 :num-asts 551 :time 0.00 :before-memory 19.09 :after-memory 19.09)
(simplifier :num-exprs 1896 :num-asts 2252 :time 0.00 :before-memory 19.09 :after-memory 19.24)
(propagate-values :num-exprs 1646 :num-asts 1971 :time 0.00 :before-memory 19.19 :after-memory 19.21)
(ctx-simplify :num-steps 6500)
(ctx-simplify :num-exprs 1646 :num-asts 1971 :time 0.00 :before-memory 19.25 :after-memory 19.30)
(simplifier :num-exprs 1646 :num-asts 1971 :time 0.00 :before-memory 19.21 :after-memory 19.21)
(solve_eqs :num-exprs 1598 :num-asts 1971 :time 0.00 :before-memory 19.21 :after-memory 19.21)
(:num-elim-vars 24)
(elim-uncnstr :num-exprs 1598 :num-asts 1971 :time 0.00 :before-memory 19.21 :after-memory 19.21)
(simplifier :num-exprs 2016 :num-asts 2389 :time 0.00 :before-memory 19.21 :after-memory 19.22)
(ilp-model-finder-tactic start)
(ilp-model-finder-tactic done)
(pb-tactic start)
(pb-tactic done)
(bounded-tactic start)
(no-cut-smt-tactic start)
(smt.tactic start)
(smt.propagate-values)
(smt.nnf-cnf)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.stats :restarts   :decisions  :clauses/bin           :simplify   :memory)
(smt.stats      :conflicts   :propagations    :lemmas        :deletions     )
(smt.stats    0    101    362   9516  3414/2961  1275/470     2  340   22.41)
(smt.stats    1    153    635  16545  3390/2961  1622/658     3  443   22.55)
(smt :num-exprs 0 :num-asts 2405 :time 0.08 :before-memory 19.22 :after-memory 19.40)
(no-cut-smt-tactic done)
(bounded-tactic done)
result: 1
NikolajBjorner commented 2 years ago

This looks like it is inside of z3.

NikolajBjorner commented 2 years ago

how about checking the error status after the call to check_sat?

NikolajBjorner commented 2 years ago
Z3_string Z3_API Z3_get_error_msg(Z3_context c, Z3_error_code err);
bakkot commented 2 years ago

Calling Z3_get_error_code after Z3_solver_check returns Z3_OK.

NikolajBjorner commented 2 years ago

Some additional comments: It looks like it is throwing internally, but then the reason_unknown should have been updated and presumably also the error code/message. So probing for error messages is likely not going to lead anywhere, but something to cover. I have no experience debugging such code. If it is possible to set breakpoints it could help or trap exceptions (some are benign and used for non-local transfer of control).

NikolajBjorner commented 2 years ago

The worst case seems that the wasm code is buggy and it is a compiler issue. It happened before.

NikolajBjorner commented 2 years ago

and I tried reproducing the issue locally from a Python wrapper around your example file.

bakkot commented 2 years ago

I can't seem to get trace output in wasm (I built with debug and enabled traces, but the .z3-trace file in memory stayed empty), and I also have no experience debugging wasm, unfortunately. I did confirm that it's not the JS code specifically which is the problem - if I build the c version with wasm, it prints the same thing as the JS version.

I'm willing to try some printf debugging if you can point me to where I ought to be looking. The divergence I see between wasm and native is that it's not trying the SMT tactic. Is there a place where I can try sticking some printfs to get insight into why that might be the case?

NikolajBjorner commented 2 years ago

it could take some time to debug this.

The code output indicate it is using this tactic: https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/smtlogics/qflia_tactic.cpp#L217

The behavior of the tactic is different depending on whether you have the bound on c_8_8. Without the bound it is supposed to "fail" the bounded tactic and then proceed with the fallback SMT tactic.

When you remove the upper bound on c_8_8, the probe:

https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/arith/add_bounds_tactic.cpp#L32

will alert that there is no bound on c_8_8. It then fails the tactic at that point.

With the bound on c_8_8, the problem is bounded and the probe succeeds. It then attempts three different tactics to finish the problem. The first one it tries seems to finish the problem.

On my instance it succeeds in both cases.

There were some issues with OCaml and static linking when z3 throws internal exceptions. So far, from what can glean there could be something going on with the exception handling code. So regarding printf / std::cout / IF_VERBOSE, in addition to tracing the place where the probe throws, trace the place where the test is supposed to return:

https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/arith/add_bounds_tactic.cpp#L46

and then the tactic failure itself:

https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/tactical.cpp#L1132

The part that the above discussion is likely missing out on is that the exception code path appears to work for you. The code path where it enters the bounded tactic work is supposed to be exercised but isn't. It never logs the '(no-cut-smt-tactic start)'

https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/smtlogics/qflia_tactic.cpp#L173

which invokes

https://github.com/Z3Prover/z3/blob/5cd1fe31fdd12954d7c8d92daa420cd1d1a4fe0d/src/tactic/smtlogics/qflia_tactic.cpp#L61

This brings me to the point where the recent chance on how parameters are passed could be at fault. It has the risk of exercising code paths that haven't been covered. The risky change is: fc77345bec16c1aa622201af6c4d088ed5886c2f

I can look manually at this or if you get to some place where it enters the smt-tactic but throws something (such as miss-configured parameters) it would be a clue.

NikolajBjorner commented 2 years ago

Given that it prints:

(bounded-tactic start)
(bounded-tactic done)
result: unknown
reason: ""

and not

(bounded-tactic start)
(no-cut-smt-tactic start)

suggests the tactic is aborted during the probe (it shouldn't when all numbers are bounded). Hopefully the pointers give some location guide on what the circumstances could be for this.

bakkot commented 2 years ago

With https://github.com/Z3Prover/z3/commit/543c16c73e4a1da5f43a6a434f83bb137126819e, I'm now seeing unclassified exception in or-else somewhere between when the call to is_unbounded completes and when (bounded-tactic done) prints. I do not see the same exception when building natively.

I added

catch (const std::exception &exc)
{
    IF_VERBOSE(10, verbose_stream() << exc.what() << " in or-else\n");
    throw;
}

and got

thread constructor failed: Resource temporarily unavailable in or-else

so it looks like it's trying to use threads somewhere despite being built with -DZ3_SINGLE_THREADED=ON?

NikolajBjorner commented 2 years ago

Nice! It must be the scoped_timer. This has to use threads. The single-threaded mode is orthogonal to the scoped timer. So somehow it manages to link/compile but not include threads.

NikolajBjorner commented 2 years ago

These no_cut tactics are invoked with small timeouts. To manage the timeouts it has to track the time in a separate thread. We used to have platform specific implementations for the timeouts but moved to C++11 threads. So there is a hard dependency on having stl threads at the moment.

NikolajBjorner commented 2 years ago

https://github.com/Z3Prover/z3/blob/master/src/util/scoped_timer.cpp

bakkot commented 2 years ago

Aha, ok. Emscripten does not build in support for threads by default; I'll see if I can get it working with pthreads enabled.

NikolajBjorner commented 2 years ago

The GitHub Action that builds z3 introduces the DZ3_SINGLE_THREAD parameter. This may be where you got it from. It doesn't imply independence of pthread. Clearly not obvious, sorry. From what I can understand the pthread utilities are avialable for wasm when invoking the compiler with the option "-pthread". (https://web.dev/webassembly-threads/).

bakkot commented 2 years ago

OK, building with pthreads gets it to the point where it's running no-cut-smt-tactic, which it appears to work, but then it hangs before getting to (bounded-tactic done). Specifically it's hanging after printing

(simplifier :num-exprs 303 :num-asts 551 :time 0.03 :before-memory 9.45 :after-memory 9.45)
(simplifier :num-exprs 1896 :num-asts 2252 :time 0.13 :before-memory 9.45 :after-memory 9.55)
(propagate-values :num-exprs 1646 :num-asts 1971 :time 0.45 :before-memory 9.53 :after-memory 9.53)
(ctx-simplify :num-steps 6500)
(ctx-simplify :num-exprs 1646 :num-asts 1971 :time 0.09 :before-memory 9.55 :after-memory 9.58)
(simplifier :num-exprs 1646 :num-asts 1971 :time 0.10 :before-memory 9.53 :after-memory 9.53)
(solve_eqs :num-exprs 1598 :num-asts 1971 :time 0.37 :before-memory 9.53 :after-memory 9.53)
(:num-elim-vars 24)
(elim-uncnstr :num-exprs 1598 :num-asts 1971 :time 0.02 :before-memory 9.53 :after-memory 9.53)
(simplifier :num-exprs 2016 :num-asts 2389 :time 0.20 :before-memory 9.53 :after-memory 9.55)
(ilp-model-finder-tactic start)
(ilp-model-finder-tactic done)
(pb-tactic start)
(pb-tactic done)
(bounded-tactic start)
(no-cut-smt-tactic start)
(smt.tactic start)
(smt.propagate-values)
(smt.nnf-cnf)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.stats :restarts   :decisions  :clauses/bin           :simplify   :memory)
(smt.stats      :conflicts   :propagations    :lemmas        :deletions     )
(smt.stats    0    101    362   9516  3414/2961  1275/470     2  340   11.80)
(smt.stats    1    153    635  16545  3390/2961  1622/658     3  443   11.93)
(smt :num-exprs 0 :num-asts 2405 :time 28.13 :before-memory 9.55 :after-memory 9.57)
(no-cut-smt-tactic done)

In the native build, the next line would be (bounded-tactic done) followed by the call to Z3_solver_check returning.

(I suppose it's possible this will finish but it seems unlikely. I've let it sit for a few minutes with no signs of progress, though it is maxing out a CPU core.)

Probably something to do with threading, possibly a bug in emscripten's support for threads. I have to get to other stuff for the moment but I'll come back to this later and see if I can figure out where it's hung.

NikolajBjorner commented 2 years ago

likely deadlocked. There were race condition bugs in scoped_timer @nunoplopes

bakkot commented 2 years ago

I've updated my repro (including the wasm artifacts) in case those are helpful to anyone.

nunoplopes commented 2 years ago

likely deadlocked. There were race condition bugs in scoped_timer @nunoplopes

Could be... @bakkot can you add some printfs in scoped_timer.cpp to print when a new thread is created, destroyed, timeout is triggered, etc.

bakkot commented 2 years ago

Added some printfs

diff ```diff diff --git a/src/util/scoped_timer.cpp b/src/util/scoped_timer.cpp index a36e762aa..3579f0e8d 100644 --- a/src/util/scoped_timer.cpp +++ b/src/util/scoped_timer.cpp @@ -48,9 +48,15 @@ static std::vector available_workers; static std::mutex workers; static atomic num_workers(0); +int id() { + return std::hash{}(std::this_thread::get_id()); +} + static void thread_func(scoped_timer_state *s) { + printf("thread_func start id %x\n", id()); workers.lock(); while (true) { + printf("thread_func loop body id %x\n", id()); s->cv.wait(workers, [=]{ return s->work > IDLE; }); workers.unlock(); @@ -61,6 +67,7 @@ static void thread_func(scoped_timer_state *s) { auto end = std::chrono::steady_clock::now() + std::chrono::milliseconds(s->ms); while (!s->m_mutex.try_lock_until(end)) { + printf("thread_func inner loop body id %x\n", id()); if (std::chrono::steady_clock::now() >= end) { s->eh->operator()(TIMEOUT_EH_CALLER); goto next; @@ -108,16 +115,22 @@ public: } ~imp() { + printf("scoped_timer::imp destructor start id %x\n", id()); s->m_mutex.unlock(); - while (s->work == WORKING) + while (s->work == WORKING) { + printf("scoped_timer::imp destructor loop body id %x\n", id()); std::this_thread::yield(); + } + printf("scoped_timer::imp destructor loop finish id %x\n", id()); workers.lock(); available_workers.push_back(s); workers.unlock(); + printf("scoped_timer::imp destructor finish id %x\n", id()); } }; scoped_timer::scoped_timer(unsigned ms, event_handler * eh) { + printf("scoped_timer::scoped_timer constructor id %x\n", id()); if (ms != UINT_MAX && ms != 0) m_imp = alloc(imp, ms, eh); else @@ -125,6 +138,7 @@ scoped_timer::scoped_timer(unsigned ms, event_handler * eh) { } scoped_timer::~scoped_timer() { + printf("scoped_timer::scoped_timer destructor id %x\n", id()); dealloc(m_imp); } @@ -139,8 +153,10 @@ void scoped_timer::initialize() { } void scoped_timer::finalize() { + printf("scoped_timer::finalize start id %x\n", id()); unsigned deleted = 0; while (deleted < num_workers) { + printf("scoped_timer::finalize loop body id %x\n", id()); workers.lock(); for (auto w : available_workers) { w->work = EXITING; @@ -158,4 +174,5 @@ void scoped_timer::finalize() { } num_workers = 0; available_workers.clear(); + printf("scoped_timer::finalize finish id %x\n", id()); } ```

and it seems to be stuck in the loop in the ~imp destructor:

scoped_timer::scoped_timer constructor id b2f6c
(simplifier :num-exprs 303 :num-asts 551 :time 0.03 :before-memory 9.45 :after-memory 9.45)
(simplifier :num-exprs 1896 :num-asts 2252 :time 0.13 :before-memory 9.45 :after-memory 9.55)
(propagate-values :num-exprs 1646 :num-asts 1971 :time 0.44 :before-memory 9.53 :after-memory 9.53)
(ctx-simplify :num-steps 6500)
(ctx-simplify :num-exprs 1646 :num-asts 1971 :time 0.10 :before-memory 9.55 :after-memory 9.58)
(simplifier :num-exprs 1646 :num-asts 1971 :time 0.10 :before-memory 9.53 :after-memory 9.53)
(solve_eqs :num-exprs 1598 :num-asts 1971 :time 0.37 :before-memory 9.53 :after-memory 9.53)
(:num-elim-vars 24)
(elim-uncnstr :num-exprs 1598 :num-asts 1971 :time 0.02 :before-memory 9.53 :after-memory 9.53)
(simplifier :num-exprs 2016 :num-asts 2389 :time 0.20 :before-memory 9.53 :after-memory 9.55)
(ilp-model-finder-tactic start)
(ilp-model-finder-tactic done)
(pb-tactic start)
(pb-tactic done)
(bounded-tactic start)
scoped_timer::scoped_timer constructor id b2f6c
(no-cut-smt-tactic start)
(smt.tactic start)
(smt.propagate-values)
(smt.nnf-cnf)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.stats :restarts   :decisions  :clauses/bin           :simplify   :memory)
(smt.stats      :conflicts   :propagations    :lemmas        :deletions     )
(smt.stats    0    101    362   9516  3414/2961  1275/470     2  340   11.80)
(smt.stats    1    153    635  16545  3390/2961  1622/658     3  443   11.93)
(smt :num-exprs 0 :num-asts 2405 :time 26.42 :before-memory 9.55 :after-memory 9.57)
(no-cut-smt-tactic done)
scoped_timer::scoped_timer destructor id b2f6c
scoped_timer::imp destructor start id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
scoped_timer::imp destructor loop body id b2f6c
[etc forever]

I note that both of the scoped_timer::scoped_timer constructor lines have same id printed, so it's possible I'm not getting the id for the threads correctly and there's actually multiple threads here (it's been years since I thought about threads in C++, sorry).

bakkot commented 2 years ago

I see that it's calling std::thread(thread_func, s); but never actually entering thread_func. It might be that spawning threads is taking longer in wasm and that means the destructor is called before the the thread gets a chance to spawn, and that's... causing issues somehow? I would expect the thread to spawn eventually, but maybe wasm isn't able to schedule it for some reason. (Maybe the repeated calls to yield are blocking it?)

bakkot commented 2 years ago

OK, here's a patch which seems to make it work (EDIT: updated to work more consistently. It seems to work pretty well. Only problem now is that finalize is never getting called, so the worker thread sits around being idle forever - when running things in node it will wait for all threads to finish before quitting, so there needs to be some way to kill off threads which are done with.)

diff --git a/src/util/scoped_timer.cpp b/src/util/scoped_timer.cpp
index 6a0d1a4b3..cc52ab3ee 100644
--- a/src/util/scoped_timer.cpp
+++ b/src/util/scoped_timer.cpp
@@ -33,7 +33,7 @@ Revision History:
 #include <pthread.h>
 #endif

-enum scoped_timer_work_state { IDLE = 0, WORKING = 1, EXITING = 2 };
+enum scoped_timer_work_state { IDLE = 0, SPAWNING = 1, WORKING = 2, EXITING = 3 };

 struct scoped_timer_state {
     std::thread m_thread;
@@ -54,6 +54,9 @@ int id() {

 static void thread_func(scoped_timer_state *s) {
+    if (s->work == SPAWNING) {
+        s->work = WORKING;
+    }
     workers.lock();
     while (true) {
@@ -108,11 +113,15 @@ public:
         s->ms = ms;
         s->eh = eh;
         s->m_mutex.lock();
-        s->work = WORKING;
         if (new_worker) {
+            s->work = SPAWNING;
+            printf("scoped_timer::imp ctor calling std::thread\n");
             s->m_thread = std::thread(thread_func, s);
         } 
         else {
+            if (s->work != SPAWNING) {
+                s->work = WORKING;
+            }
             s->cv.notify_one();
         }
     }

Basically just, if the imp destructor is called before the thread spawns, don't go into the yield loop.

This definitely isn't threadsafe as written; I haven't thought through how the mutexes work and presumably they should at least be compare_exchanges rather than comparisons and exchanges (in case the thread spawns while the imp destructor is running). But my atomics are rusty enough that I don't want to try to get this right.

nunoplopes commented 2 years ago

I see that it's calling std::thread(thread_func, s); but never actually entering thread_func. It might be that spawning threads is taking longer in wasm and that means the destructor is called before the the thread gets a chance to spawn, and that's... causing issues somehow? I would expect the thread to spawn eventually, but maybe wasm isn't able to schedule it for some reason. (Maybe the repeated calls to yield are blocking it?)

That sounds like a bug in wasm., Maybe try to run a small c++ program that uses std::thread to spawn a thread that does a printf to check if that works. Thanks!

bakkot commented 2 years ago

Yeah, seems like a wasm bug (or I've misunderstood how threads work in wasm). I've filed https://github.com/emscripten-core/emscripten/issues/15868.

nunoplopes commented 2 years ago

Yeah, seems like a wasm bug (or I've misunderstood how threads work in wasm). I've filed emscripten-core/emscripten#15868.

Thanks! If wasm supports some sort of timers with signals or whatever, we could have a specific implementation for wasm. We don't use signals here because they may disrupt the host application and because C++ still doesn't have a portable API.

NikolajBjorner commented 2 years ago

Would the following work for wasm and other platforms: a condition variable (binary condition variable/mutex), that gets released when the thread callback starts and is waited on by the part that creates the thread? This would be to make the behavior less dependent on pre-emption properties of the thread library.

nunoplopes commented 2 years ago

Would the following work for wasm and other platforms: a condition variable (binary condition variable/mutex), that gets released when the thread callback starts and is waited on by the part that creates the thread? This would be to make the behavior less dependent on pre-emption properties of the thread library.

Uhm, that would slow down the main thread for no good reason. There's really no bug here on Z3's side. It's wasm that never executes the spawned thread (see the small test from @bakkot). There's no protocol we can implement to fix that 😅

NikolajBjorner commented 2 years ago

Not sure about your claim: the spawned thread is never executed or if the thread is not executed under certain conditions.

nunoplopes commented 2 years ago

Not sure about your claim: the spawned thread is never executed or if the thread is not executed under certain conditions.

I see what you mean. You are wondering if you could force the scheduler by hanging the main thread with a condition on the timer thread. Maybe that works, who knows. But I would rather not slow down 99.9% of the users to fix a problem for 1 user. Some #ifdefs are fine, of course. Once we know what needs to be done. Maybe wasm has a timer API that allows us to avoid threads altogether, I don't know.

bakkot commented 2 years ago

As far as I can tell, it's specifically that the thread is not spawned while the main thread is busy. So I think a mutex or something which puts the main thread to sleep until the thread spawns would fix this, yes.

But there's probably a way to do this with less invasive changes; I will see if I can get an answer from someone with more experience with wasm.

bakkot commented 2 years ago

OK, it seems like there's a bug in emscripten which prevents std::thread from working correctly on the main thread.

Z3_solver_check is expensive, so it's probably best not to call it on the main thread anyway. I can make a wrapper which deals with that. Are there other parts of the API which might spawn threads? If yes, is there any way to know which ones?

NikolajBjorner commented 2 years ago

These methods may all use timeouts directly if a timeout is set by the user and/or inside functions that use timeouts.

- Z3_simplify, Z3_simplify_ex
- Z3_solver_check, Z3_solver_check_assumptions, Z3_solver_cube, Z3_solver_get_consequences
- Z3_tactic_apply, Z3_tactic_apply_ex
- Z3_optimize_check
- Z3_algebraic_roots, Z3_algebraic_eval
- Z3_fixedpoint_query, Z3_fixedpoint_query_relation, Z3_fixedpoint_query_from_lvl
- Z3_polynomial_subresultants

So from the bug report std::thread is pretty broken. This suggests it is a reasonable priority to fix. I could speculate about ways to work around the behavior, but they don't seem too appealing (introduce a compile time flag or global parameter that when set makes the scoped timer code spawn a pthread directly) except they are friendlier than having to track all the APIs mentioned above and rely on that I didn't forget something.

bakkot commented 2 years ago

It sounds like emscripten might end up fixing std::thread, so it's probably not worth trying to work around it in this project for now.

For my purposes, I'm writing a JS wrapper which does not expose the underlying APIs directly, and I can have the wrapper deal with the complexity of ensuring those methods are only invoked off the main thread. (You don't want to invoke anything long-running on the main thread anyway - it blocks the UI.) So it's not a huge problem for me, assuming my plan for the wrapper ends up working.

bakkot commented 2 years ago

Got my wrapper working: https://github.com/bakkot/z3-js-bindings

Ensuring that Z3_solver_check doesn't get called from the main thread does indeed make it work. I haven't implemented any of the other long-running parts of the API yet, but I expect they'll work too.

I'll leave this open until emscripten lands their fix and I confirm it resolves the issue here.

NikolajBjorner commented 2 years ago

Maybe given the current state, the simplest way to wrap z3 is to have all interaction happen in a thread. For example, z3 sessions are centered around contexts. If the interface that creates a context spawns a thread, and the context is only available in that thread, it is somewhat more-straight forward to ensure the invariant that calls into z3 are not in the main thread. I am not suggesting that it is for you to sink bandwidth into, but @aestriplex in #5738 is discussing creating supported TS/JS bindings and is inclined to provide these bindings using wasm.

bakkot commented 2 years ago

If the interface that creates a context spawns a thread, and the context is only available in that thread

That's kind of tricky to do in JavaScript. If the interface for creating the context spawns a thread itself, there's no good way for the user to put their code in that thread. The best you can do is to proxy all calls to a thread, but that's annoying because it means every operation needs to be async (and adds a bunch of overhead).

I'm content with my solution, where specific methods have a custom wrapper which puts them in their own thread.

bakkot commented 2 years ago

Closing; I have things working to my satisfaction. The next release of emscripten will probably fix the issue described above, but I've worked around it.

I'm happy with my bindings and am ready to contribute them to this project; I'm continuing that discussion here.