Open leonardoalt opened 4 years ago
Check whether you can get the reason for unknown using get_reason_unknown
.
These things are not easy to debug, and you are using quantifiers, which makes things less deterministic. However, I always worry whether this can be an undefined behavior.
You can reproduce this reliably? Only happens on one file? One other trick is to run one file with multiple random seeds (fp.spacer.random_seed
).
The trace seems cut off. It ends with expand-pob not inside propagate with add-lemma.
The trace seems cut off. It ends with expand-pob not inside propagate with add-lemma.
True. The trace I gave is actually for a SAT
query that comes after the problematic one.
So the case actually is: 3 consecutive queries, UNSAT, UNSAT, SAT. The problematic one is the second. Here's the UNSAT trace we need: https://gist.github.com/leonardoalt/c839c96ad520b829b319a93b9f84287b
If I remove the said last SAT query, then the problem is gone.
I can reproduce it reliably, yes. It happens to 3 of my benchmarks, but they are almost the same.
I actually described it wrong, I don't get unknown
, I get an exception:
Stuck on a lemma
If I add the fp.spacer.trace_file
option I also get unsat.
One other trick is to run one file with multiple random seeds (fp.spacer.random_seed).
I don't really mind unknown
or exception
, as long as it's consistent on all the platforms I run.
This was to help reproduce the error. If the problem is due to randomness of heavy load, it might be possible to reproduce it reliably with a different random seed. In which case I could debug it.
Stuck on a lemma
is my protection when the same lemma is added many times. Usually indicates an infinite loop. The reason for it happening sometimes and not others is likely due to non-determinism inside the solver. If we can reproduce it, I can debug it to see if something can be done to make this behavior more deterministic.
Right, ok. That's the tricky part. I can try to set up a server with everything + debug libz3 so you could debug it. Or did you mean the trace?
if you can repro this reliably then we can figure out how I can run it. My understanding that now it only happens under certain load on your machine.
If you can make it happen in a docker container, that could be a start. But random seed is easier.
@agurfinkel I managed to get consistent traces on another example, although the case is a little different:
On my local system (Arch Linux z3 4.8.8 dynamically linked) I get SAT with this trace: https://gist.github.com/leonardoalt/31be0c87ba1f651134740ce21130b2cf
On my CI system (Ubuntu 20.04 z3 4.8.8 statically linked) I get max. resource limit exceeded
with this trace:
https://gist.github.com/leonardoalt/9ec2505cbead6a61e076ee24d6f417cd
I'm using the same rlimit
on both.
Both runs have weak_abs=true
, so this is unrelated to the original issue.
The traces don't seem to give a lot of info in this case, since one just stops and the other goes on.
Any idea what could be causing this?
@agurfinkel I ran into another example of two runs of the same query giving different results, unsat and rlimit exceeded:
Query: https://gist.github.com/leonardoalt/adfdcbe02ae02e2742e79ef09144346a If I run the z3 binary I only get the unsat case.
The issue only happens via the C++ API:
z3 trace for the unsat case (verbose 3): https://gist.github.com/leonardoalt/2f0696e365d999d853b608debd4ebec1 spacer trace for the unsat case: https://gist.github.com/leonardoalt/f91241f5182af9447c1a8e648aa0c397
z3 trace for the rlimit exceeded case (verbose 3): https://gist.github.com/leonardoalt/e1734856d99d621cc311510055d4ee1a spacer trace for the rlimit exceeded case: https://gist.github.com/leonardoalt/dae8efb967c80dbeb74f04d1280dbda3
I also tried compiling z3 in debug mode to look for the divergence in the internal calls, but the issue is gone then. Are these traces any helpful for this issue? I'm out of possible hints...
There are actually 2 queries, and the issue is on the first query. If I remove the second query the issue is gone, that's why I left it there.
@agurfinkel have some news here: The difference that triggers the different results is whether z3 is compiled using GMP or not. The Arch package comes with GMP, whereas the default is no GMP, that's why I've seen inconsistencies on Arch vs Ubuntu.
@agurfinkel sorry for taking so long on this, I finally built a docker image where you can replicate the nondeterminism: https://hub.docker.com/r/leonardoalt/z3solidity/tags
It is an Arch Linux base with
To replicate the test inside the docker image, you can run
cd /solidity/build
./test/tools/isoltest -t smtCheckerTests/loops/for_loop_array_assignment_storage_storage
LD_LIBRARY_PATH=/z3/build ./test/tools/isoltest -t smtCheckerTests/loops/for_loop_array_assignment_storage_storage
The first run of the test uses z3+GMP from the system, and you will (hopefully) see that the last query returns UNKNOWN: ok
, whereas in the second run (z3 without GMP) it returns SAT
.
Unfortunately, all these following very minor changes influence the output and both runs then return the same:
/solidity/libsmtutil/Z3CHCInterface.cpp:72
and :125
, respectively. The second one is useful especially to make sure that the queries are indeed the same on both runs of the test (they are).I'm still trying to get the Spacer traces for the two runs, but hopefully this is a start.
I ran into another nondeterminism case, where in one system I get unsat on another I get rlimit exceeded
. I managed to get the spacer traces for both, but unfortunately (or fortunately?) this happens only via the C++ API, so not sure it makes sense to provide the query (z3 cli answers unsat on both systems).
I'm not sure this helps give any hints, but posting for reference:
Unsat: https://gist.github.com/leonardoalt/ee467fcc8ede92db7ca1ac49f537904d Rlimit exceeded: https://gist.github.com/leonardoalt/b5801e0e0b8f2cae72e68e841e5605da
I have this one case where Spacer returns
unknown
when I run all my regression tests together (heavy load), whereas it returnsunsat
if I run just the single test (or a smaller subset). On the heavy load scenario, if I try to getspacer_trace
or any kind of info out, it becomesunsat
also. So I'm guessing there's some timing/resource issue going on here. All the queries are run via the C++ API. I don't see the issue whenfp.weak_abs=false
is not used (that's the change that triggered it).I'm using
rlimit=40000000
, and these further options:I understand that without the
unknown
trace it's virtually impossible to know what's going on, I just wanted to see if there's any possibility of a hint given the info above. Please feel free to close right away otherwise. Here's a trace from theunsat
case: https://gist.github.com/leonardoalt/7c533a743f0377c8a3694e654d743b7fIn the meantime I'm still trying to somehow get a trace from the
unknown
case.