Open nunoplopes opened 4 years ago
Would it be great if I build a small webpage that tracks this?
yes, I think that would be very useful, but it's not urgent 😀 It has a bit that is not obvious is that it needs to account for SMT random seeds. Z3 doesn't consider perf bugs if changing the seed makes it solve the problem. So we would need to sample a few seeds for each test and version.
If the average run-time over several random seeds is higher, then it is still a perf regression.
Okay, maybe I can pick a few random seeds (well, randomly) and show the individual results as well as the average result.
As Nuno wrote use a few samples. The seeds don't need to be random, just numbers 1-10 will do the trick.
for i in range(10):
os.system("z3.exe smt.random_seed=%d %s -st" % (i, file))
I made this page: http://147.46.242.54:15243
It tracks the performance of Alive2 w.r.t Z3 when running LLVM unit tests. The detailed page shows in which tests timeout ratios are high (http://147.46.242.54:15243/show/8985651-af3789a-6cc52e0).
Currently, it has only one slot filled but I'm going to run a batch of tests & fill the remaining slots. Also, there isn't diff two results
page yet; I'll make the page, eh, maybe after the preparation of Q&A for my talk :)
this is very cool!
Is there a way to (@nuno, I talking to you with the addition of Z3_open_log..)
that's the plan. We've already added support for dumping Z3 logs. So it's just a matter of running Alive2 with the right arguments and then spam your email while you are on vacations 😁
@aqjune I really hope that code is PHP 🙄 Would be nice to upstream that.
I can make the log dumps contain the generated VCs as well; the smt-verbose
option, in addition to the z3 open log, will work.
Or, to make the log dump more lightweight, I can add a page that lists the tests that has been timeout for a long period first and run on those only. It will make the log dump smaller (currently the log dump is already ~400 megabytes per configuration).
The webpage is written in Python using django. No special reason, just my PHP experience is almost nothing (to make a really simple homepage about >10yrs ago) :)
Seems like Alive2 has a performance boost between https://github.com/Z3Prover/z3/commit/6cc52e04c3ea7e2534644a285d231bdaaafd8714 and https://github.com/Z3Prover/z3/commit/ae7d76767b614dd8f77a692e00d6d157df31e03d ! Is this an expected change?
are you averaging over how many random seeds?
I'm currently giving 5 seeds.
I don't think there's any reason for a perf change judging from the Z3 commits msgs. One think to keep in mind is that Z3 is memory bound; it has a lot of cache misses. That means that if you have a lot of processes they start contending to access the memory and thus results won't be very reliable. You need to assess what's the max number of threads in your computer before perf starts to become unstable. No hyperthreading is a start, but you may need to go even lower depending on the memory BW of your CPU.
Thank you for the advice! I was using hyperthreading enabled because using more cores gave additional speedup. For performance comparison, I'll disable it.
Lowered the number of cores to 8 (from 24), and filling slots with the new configuration
Z3 4.8.9 regressed perf compared to 4.8.8 in a few benchmarks (e.g., just see CI bots, both linux & Mac). It would be nice to have an automated way of diffing new timeouts in unit tests & LLVM test suite and then produce Z3 logs using
Z3_open_log
and friends. Similarly, we could have an automated way of running John's perl script to detect score degradations with new Z3 releases.@NikolajBjorner says he'll fix any reported regression in 24 hours 😁