Closed jurajsic closed 4 months ago
For QF_SNIA
formulae:
Benchmark snia
# of formulae: 70
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ---- ---- ------ ------ ------ ------ ----- ------- --------- ---- -------- -------
z3-noodler-f552b5f-d95fe13 70 0 0.98 0.01 0.01 0.01 70 0 0 0 0 0
cvc5-1.1.2 70 0 0.00 0.00 0.00 0.00 70 0 0 0 0 0
z3-4.13.0 0 70 0.00 nan nan nan 0 0 0 0 0 70
ostrich-e386836db 70 0 119.74 1.71 1.70 0.20 70 0 0 0 0 0
--------------------------------------------------
Z3 returns correctly sat
for all 70 instances, but also prints some error, so it does not show up correctly in the table.
How come we have now less unknowns but more TOs than before?
Benchmark str_small_rw
# of formulae: 1880
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ---- ---- ------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-f552b5f-d95fe13 1762 118 341.40 0.19 0.01 2.75 4 1758 92 17 9 0
z3-noodler-6802421-d95fe13 1752 128 399.87 0.23 0.01 3.38 5 1747 103 13 12 0
cvc5-1.1.2 1861 19 31.09 0.02 0.00 0.56 20 1841 2 17 0 0
z3-4.13.0 1821 59 214.88 0.12 0.01 2.86 25 1796 0 59 0 0
ostrich-e386836db 1709 171 7478.26 4.38 1.63 6.54 19 1690 0 171 0 0
--------------------------------------------------
Benchmark pyex
# of formulae: 23845
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ----- ----- --------- ----- ----- ----- ----- ------- --------- ----- -------- -------
z3-noodler-f552b5f-d95fe13 23747 98 4970.47 0.21 0.11 1.56 20377 3370 0 96 2 0
z3-noodler-6802421-d95fe13 23754 91 4734.25 0.20 0.10 1.84 20383 3371 0 89 2 0
cvc5-1.1.2 23826 19 6384.00 0.27 0.07 2.48 20406 3420 0 19 0 0
z3-4.13.0 22858 987 54817.37 2.40 0.12 10.23 19587 3271 0 987 0 0
ostrich-e386836db 11017 12828 353972.36 32.13 20.71 35.14 7801 3216 0 12828 0 0
--------------------------------------------------
Benchmark woorpje
# of formulae: 809
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ---- ---- ------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-f552b5f-d95fe13 771 38 174.18 0.23 0.02 2.30 609 162 0 21 17 0
z3-noodler-6802421-d95fe13 774 35 252.34 0.33 0.01 4.49 612 162 0 17 18 0
cvc5-1.1.2 755 54 45.21 0.06 0.01 0.23 591 164 0 54 0 0
z3-4.13.0 782 27 300.36 0.38 0.02 4.80 618 164 0 27 0 0
ostrich-e386836db 755 54 6876.45 9.11 4.60 15.58 589 166 0 51 0 3
--------------------------------------------------
This three benchmarks have the most different number of solved instances, I would say it's just Z3 randomness. Here is also scatter plot:
You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)
You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)
Are you sure? According to list_of_tools.txt
I am comparing with the version from #140.
Hmm, it seems that 6802421
is older than 82db97c
in #140, here is comparison with the newer one:
# of formulae: 103318
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ------ ----- --------- ----- ----- ----- ----- ------- --------- ----- -------- -------
z3-noodler-f552b5f-d95fe13 102191 1127 14077.39 0.14 0.02 1.52 62776 39415 628 453 46 0
z3-noodler-82db97c-d95fe13 102200 1118 12913.53 0.13 0.01 1.64 62783 39417 632 434 52 0
cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0
z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0
ostrich-e386836db 84819 18499 674023.96 7.95 1.62 17.80 46667 38152 0 15197 0 3302
--------------------------------------------------
I turned off other (unneeded) theories again, like before, the results:
# of formulae: 103318
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ------ ---- --------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-b7b8ad5-d95fe13 102204 1114 12937.44 0.13 0.01 1.63 62783 39421 628 435 51 0
z3-noodler-82db97c-d95fe13 102200 1118 12913.53 0.13 0.01 1.64 62783 39417 632 434 52 0
cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0
z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0
--------------------------------------------------
QF_SNIA
:
# of formulae: 70
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ---- ---- ------ ------ ------ ------ ----- ------- --------- ---- -------- -------
z3-noodler-b7b8ad5-d95fe13 70 0 0.82 0.01 0.01 0.01 70 0 0 0 0 0
cvc5-1.1.2 70 0 0.00 0.00 0.00 0.00 70 0 0 0 0 0
z3-4.13.0 0 70 0.00 nan nan nan 0 0 0 0 0 70
--------------------------------------------------
There is slight degradation for the time of solving, but otherwise it looks like there are no problems.