Spacer nondeterminism - Githubissues

leonardoalt commented 4 years ago

I have this one case where Spacer returns unknown when I run all my regression tests together (heavy load), whereas it returns unsat if I run just the single test (or a smaller subset). On the heavy load scenario, if I try to get spacer_trace or any kind of info out, it becomes unsat also. So I'm guessing there's some timing/resource issue going on here. All the queries are run via the C++ API. I don't see the issue when fp.weak_abs=false is not used (that's the change that triggered it).

I'm using rlimit=40000000, and these further options:

rewriter.pull_cheap_ite=true
fp.spacer.q3.use_qgen=true
fp.spacer.mbqi=false
fp.spacer.ground_pobs=fase
fp.weak_abs=false

I understand that without the unknown trace it's virtually impossible to know what's going on, I just wanted to see if there's any possibility of a hint given the info above. Please feel free to close right away otherwise. Here's a trace from the unsat case: https://gist.github.com/leonardoalt/7c533a743f0377c8a3694e654d743b7f

In the meantime I'm still trying to somehow get a trace from the unknown case.

agurfinkel commented 4 years ago

Check whether you can get the reason for unknown using get_reason_unknown.

These things are not easy to debug, and you are using quantifiers, which makes things less deterministic. However, I always worry whether this can be an undefined behavior.

You can reproduce this reliably? Only happens on one file? One other trick is to run one file with multiple random seeds (fp.spacer.random_seed).

agurfinkel commented 4 years ago

The trace seems cut off. It ends with expand-pob not inside propagate with add-lemma.

leonardoalt commented 4 years ago

The trace seems cut off. It ends with expand-pob not inside propagate with add-lemma.

True. The trace I gave is actually for a SAT query that comes after the problematic one. So the case actually is: 3 consecutive queries, UNSAT, UNSAT, SAT. The problematic one is the second. Here's the UNSAT trace we need: https://gist.github.com/leonardoalt/c839c96ad520b829b319a93b9f84287b

If I remove the said last SAT query, then the problem is gone.

leonardoalt commented 4 years ago

I can reproduce it reliably, yes. It happens to 3 of my benchmarks, but they are almost the same. I actually described it wrong, I don't get unknown, I get an exception: Stuck on a lemma If I add the fp.spacer.trace_file option I also get unsat.

leonardoalt commented 4 years ago

One other trick is to run one file with multiple random seeds (fp.spacer.random_seed).

I don't really mind unknown or exception, as long as it's consistent on all the platforms I run.

agurfinkel commented 4 years ago

This was to help reproduce the error. If the problem is due to randomness of heavy load, it might be possible to reproduce it reliably with a different random seed. In which case I could debug it.

Stuck on a lemma is my protection when the same lemma is added many times. Usually indicates an infinite loop. The reason for it happening sometimes and not others is likely due to non-determinism inside the solver. If we can reproduce it, I can debug it to see if something can be done to make this behavior more deterministic.

leonardoalt commented 4 years ago

Right, ok. That's the tricky part. I can try to set up a server with everything + debug libz3 so you could debug it. Or did you mean the trace?

agurfinkel commented 4 years ago

if you can repro this reliably then we can figure out how I can run it. My understanding that now it only happens under certain load on your machine.

If you can make it happen in a docker container, that could be a start. But random seed is easier.

leonardoalt commented 4 years ago

@agurfinkel I managed to get consistent traces on another example, although the case is a little different:

On my local system (Arch Linux z3 4.8.8 dynamically linked) I get SAT with this trace: https://gist.github.com/leonardoalt/31be0c87ba1f651134740ce21130b2cf

On my CI system (Ubuntu 20.04 z3 4.8.8 statically linked) I get max. resource limit exceeded with this trace: https://gist.github.com/leonardoalt/9ec2505cbead6a61e076ee24d6f417cd

I'm using the same rlimit on both.

Both runs have weak_abs=true, so this is unrelated to the original issue. The traces don't seem to give a lot of info in this case, since one just stops and the other goes on. Any idea what could be causing this?

leonardoalt commented 4 years ago

@agurfinkel I ran into another example of two runs of the same query giving different results, unsat and rlimit exceeded:

Query: https://gist.github.com/leonardoalt/adfdcbe02ae02e2742e79ef09144346a If I run the z3 binary I only get the unsat case.

The issue only happens via the C++ API:

z3 trace for the unsat case (verbose 3): https://gist.github.com/leonardoalt/2f0696e365d999d853b608debd4ebec1 spacer trace for the unsat case: https://gist.github.com/leonardoalt/f91241f5182af9447c1a8e648aa0c397

z3 trace for the rlimit exceeded case (verbose 3): https://gist.github.com/leonardoalt/e1734856d99d621cc311510055d4ee1a spacer trace for the rlimit exceeded case: https://gist.github.com/leonardoalt/dae8efb967c80dbeb74f04d1280dbda3

I also tried compiling z3 in debug mode to look for the divergence in the internal calls, but the issue is gone then. Are these traces any helpful for this issue? I'm out of possible hints...

There are actually 2 queries, and the issue is on the first query. If I remove the second query the issue is gone, that's why I left it there.

leonardoalt commented 4 years ago

@agurfinkel have some news here: The difference that triggers the different results is whether z3 is compiled using GMP or not. The Arch package comes with GMP, whereas the default is no GMP, that's why I've seen inconsistencies on Arch vs Ubuntu.

leonardoalt commented 4 years ago

@agurfinkel sorry for taking so long on this, I finally built a docker image where you can replicate the nondeterminism: https://hub.docker.com/r/leonardoalt/z3solidity/tags

It is an Arch Linux base with

z3 4.8.8 with GMP installed system wide (/usr/lib/libz3*)
z3 4.8.8 without GMP compiled in /z3/build
a specific solidity branch with some extra output in /solidity

To replicate the test inside the docker image, you can run

cd /solidity/build
./test/tools/isoltest -t smtCheckerTests/loops/for_loop_array_assignment_storage_storage
LD_LIBRARY_PATH=/z3/build ./test/tools/isoltest -t smtCheckerTests/loops/for_loop_array_assignment_storage_storage

The first run of the test uses z3+GMP from the system, and you will (hopefully) see that the last query returns UNKNOWN: ok, whereas in the second run (z3 without GMP) it returns SAT.

Unfortunately, all these following very minor changes influence the output and both runs then return the same:

Changing the solidity test to only have the nondeterministic assertion instead of all 3
Printing the rules inside the solver before querying
Saving the Spacer trace file The last two options can be changed in /solidity/libsmtutil/Z3CHCInterface.cpp:72 and :125, respectively. The second one is useful especially to make sure that the queries are indeed the same on both runs of the test (they are).

I'm still trying to get the Spacer traces for the two runs, but hopefully this is a start.

leonardoalt commented 4 years ago

I ran into another nondeterminism case, where in one system I get unsat on another I get rlimit exceeded. I managed to get the spacer traces for both, but unfortunately (or fortunately?) this happens only via the C++ API, so not sure it makes sense to provide the query (z3 cli answers unsat on both systems).

I'm not sure this helps give any hints, but posting for reference:

Unsat: https://gist.github.com/leonardoalt/ee467fcc8ede92db7ca1ac49f537904d Rlimit exceeded: https://gist.github.com/leonardoalt/b5801e0e0b8f2cae72e68e841e5605da

Z3Prover / z3

Spacer nondeterminism #4407