Closed vhavlena closed 2 months ago
Well, it seems there is a serious performance degradation, compared to devel
# of formulae: 103318
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
----------------------------------------------- ------ ----- --------- ----- ----- ----- ----- ------- --------- ----- -------- -------
z3-noodler-1966907-d95fe13 101850 1468 16276.20 0.16 0.02 2.06 62516 39334 631 784 53 0
z3-noodler-6802421-d95fe13 102191 1127 12819.51 0.13 0.01 1.59 62783 39408 639 437 51 0
cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0
z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0
ostrich-5dd2e10ca 86344 16974 782639.00 9.06 1.75 19.20 47556 38788 0 16962 0 12
z3-4.13.0+cvc5-1.1.2 100756 2562 60604.78 0.60 0.00 6.00 61099 39657 2562 0 0 0
z3-noodler-1966907-d95fe13+cvc5-1.1.2 102836 482 13260.38 0.13 0.00 2.24 62997 39839 482 0 0 0
z3-noodler-1966907-d95fe13+z3-4.13.0 102441 877 13564.62 0.13 0.01 1.91 62689 39752 877 0 0 0
z3-noodler-1966907-d95fe13+z3-4.13.0+cvc5-1.1.2 102849 469 12718.41 0.12 0.00 2.26 63006 39843 469 0 0 0
--------------------------------------------------
There are also quite many (77) instances in stringfuzz
where the length-based procedure gives incorrect result.
The problem is caused by the redos
benchmark. Probably in the suitability function of the length-based procedure.
Results of the fixed version. It helps a bit.
# of formulae: 103318
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
----------------------------------------------- ------ ----- --------- ----- ----- ----- ----- ------- --------- ----- -------- -------
z3-noodler-52a04e9-d95fe13 102201 1117 12606.31 0.12 0.01 1.53 62784 39417 632 434 51 0
z3-noodler-6802421-d95fe13 102191 1127 12819.51 0.13 0.01 1.59 62783 39408 639 437 51 0
cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0
z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0
ostrich-5dd2e10ca 86344 16974 782639.00 9.06 1.75 19.20 47556 38788 0 16962 0 12
z3-4.13.0+cvc5-1.1.2 100756 2562 60604.78 0.60 0.00 6.00 61099 39657 2562 0 0 0
z3-noodler-52a04e9-d95fe13+cvc5-1.1.2 103176 142 9180.41 0.09 0.00 1.35 63260 39916 142 0 0 0
z3-noodler-52a04e9-d95fe13+z3-4.13.0 102787 531 9548.52 0.09 0.01 0.96 62958 39829 531 0 0 0
z3-noodler-52a04e9-d95fe13+z3-4.13.0+cvc5-1.1.2 103188 130 8652.20 0.08 0.00 1.38 63268 39920 130 0 0 0
--------------------------------------------------
Are you comparing with the correct version? I think it should be z3-noodler-82db97c-d95fe13
. Maybe a good idea would be to rebase/merge with devel and run it again and compare then compare it with z3-noodler-b7b8ad5-d95fe13
.
I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.
I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.
Are you sure? According to #140, z3-noodler-82db97c-d95fe13
was the last commit. And comparing those two:
# of formulae: 103318
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
-------------------------- ------ ---- --------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-52a04e9-d95fe13 102201 1117 12606.31 0.12 0.01 1.53 62784 39417 632 434 51 0
z3-noodler-82db97c-d95fe13 102200 1118 12913.53 0.13 0.01 1.64 62783 39417 632 434 52 0
cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0
z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0
--------------------------------------------------
The results are not that interesting, it is a bit faster with one more solved instance.
I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.
Are you sure? According to #140,
z3-noodler-82db97c-d95fe13
was the last commit. And comparing those two:# of formulae: 103318 -------------------------------------------------- tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other -------------------------- ------ ---- --------- ----- ----- ----- ----- ------- --------- ---- -------- ------- z3-noodler-52a04e9-d95fe13 102201 1117 12606.31 0.12 0.01 1.53 62784 39417 632 434 51 0 z3-noodler-82db97c-d95fe13 102200 1118 12913.53 0.13 0.01 1.64 62783 39417 632 434 52 0 cvc5-1.1.2 99699 3619 76478.24 0.77 0.00 6.77 60798 38901 2 3608 9 0 z3-4.13.0 97687 5631 132036.99 1.35 0.01 8.11 58874 38813 210 4960 461 0 --------------------------------------------------
The results are not that interesting, it is a bit faster with one more solved instance.
Yes, you are right; z3-noodler-82db97c-d95fe13
seems to be the relevant version.
Current results:
# of formulae: 103405
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
--------------------------------- ------ ---- --------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-length-de9f98c-d95fe13 102468 937 11627.25 0.11 0.02 1.58 63039 39429 651 230 56 0
z3-noodler-a8ca33a-d95fe13 102338 1067 12090.90 0.12 0.02 1.67 62907 39431 625 389 53 0
cvc5-1.1.2 99774 3631 76479.19 0.77 0.00 6.77 60870 38904 2 3620 9 0
z3-4.13.0 97694 5711 132044.66 1.35 0.01 8.11 58876 38818 210 4968 463 70
--------------------------------------------------
There is more unknowns (the procedure may introduce unknowns), but quite significantly reduces the number of timeouts.
The results on kaluza
# of formulae: 19432
--------------------------------------------------
tool ✅ ❌ time avg med std sat unsat unknown TO MO+ERR other
--------------------------------- ----- ---- ------- ----- ----- ----- ----- ------- --------- ---- -------- -------
z3-noodler-length-de9f98c-d95fe13 19341 91 820.31 0.04 0.01 1.50 16243 3098 26 48 17 0
z3-noodler-a8ca33a-d95fe13 19211 221 1063.08 0.06 0.01 1.44 16112 3099 0 205 16 0
cvc5-1.1.2 19431 1 1061.97 0.05 0.00 1.87 16302 3129 0 1 0 0
z3-4.13.0 19148 284 2019.12 0.11 0.01 2.02 16032 3116 0 284 0 0
--------------------------------------------------
@jurajsic Can you refresh your review? I have made some effort to make it more merge-ready.
This PR introduces the length decision procedure implemented by @xhrani03. I have made some improvements extending the possible fragment. The length-based procedure is off by default (it can be turned on by
smt.str.try_length_proc=true
.