Closed wilcoxjay closed 1 year ago
Adding the integration tests can wait for another PR but I would like to move the save_tee()
calls to make this output more complete.
I added the integration test and moved the save_tee calls.
Great! Can you rebase on top of main and fix the conflicts?
As preparation for benchmarking flyvy, we will want results to be reproducible, which means that all sources of randomness need to be controlled.
As a first cut, this PR makes executions of UPDR (more?) deterministic by fixing several issues related to hash table iteration order.
My workflow was:
which runs UPDR ten times with the same random seed and collects stdout/stderr logs as well as all smt2 "tee" logs for each run.
Then I diffed the stdout/stderr logs with
and confirmed that there were no differences.
When there were differences, I used the smt2 "tee" logs to find the first query file name that differed (I added code to
save_tee
to print the query file names as they were saved, in order, so that I could grep for them easily in the log -- I did not push this code). I then diffed the first two differing queries to understand what was causing the divergence. In all cases I found, this boiled down to iterating over a hash table. (We may want to consider a policy of not iterating over hash tables, because these bugs may be difficult to keep out of the codebase otherwise.)Along the way, I made a small improvement to the smt2 "tee" logs, so that they now include the answer from check-sat as a comment.
Can anyone think of a good way to make this into an integration test? Is there a way to run the tool over and over from inside a rust test? I'm not sure if we just call
updr
as a function that we will actually observe a different hash iteration order (I didn't try it).Also, we should consider running similar investigations for all other subcommands.