flyvy-verifier / flyvy

An experimental framework for temporal verification based on first-order linear-time temporal logic. Our goal is to express transition systems in first-order logic and verify temporal correctness properties, including safety and liveness.
BSD 2-Clause "Simplified" License
14 stars 1 forks source link

Improve query determinism #86

Closed wilcoxjay closed 1 year ago

wilcoxjay commented 1 year ago

As preparation for benchmarking flyvy, we will want results to be reproducible, which means that all sources of randomness need to be controlled.

As a first cut, this PR makes executions of UPDR (more?) deterministic by fixing several issues related to hash table iteration order.

My workflow was:

cargo build
for i in {1..10}; do 
  rm -rf .flyvy-log
  echo $i
  ./target/debug/temporal-verifier updr-verify examples/lockserver.fly --solver-seed 1 --smt > log-$i.txt 2>&1
  mv .flyvy-log/lockserver/ log-$i-smt-logs
done

which runs UPDR ten times with the same random seed and collects stdout/stderr logs as well as all smt2 "tee" logs for each run.

Then I diffed the stdout/stderr logs with

for i in {2..10}; do diff -uw log-1.txt log-$i.txt; done

and confirmed that there were no differences.

When there were differences, I used the smt2 "tee" logs to find the first query file name that differed (I added code to save_tee to print the query file names as they were saved, in order, so that I could grep for them easily in the log -- I did not push this code). I then diffed the first two differing queries to understand what was causing the divergence. In all cases I found, this boiled down to iterating over a hash table. (We may want to consider a policy of not iterating over hash tables, because these bugs may be difficult to keep out of the codebase otherwise.)

Along the way, I made a small improvement to the smt2 "tee" logs, so that they now include the answer from check-sat as a comment.

Can anyone think of a good way to make this into an integration test? Is there a way to run the tool over and over from inside a rust test? I'm not sure if we just call updr as a function that we will actually observe a different hash iteration order (I didn't try it).

Also, we should consider running similar investigations for all other subcommands.

tchajed commented 1 year ago

Adding the integration tests can wait for another PR but I would like to move the save_tee() calls to make this output more complete.

wilcoxjay commented 1 year ago

I added the integration test and moved the save_tee calls.

tchajed commented 1 year ago

Great! Can you rebase on top of main and fix the conflicts?