VeriFIT / z3-noodler

The Z3-Noodler String Solver
Other
9 stars 5 forks source link

Update base Z3 to v4.13.0 #133

Closed jurajsic closed 6 months ago

jurajsic commented 7 months ago

Updates base Z3 to v4.13.0. I also updated README.md and README-Z3.md.

jurajsic commented 7 months ago
Benchmark sygus_qgen/to120.csv
# of formulae: 343
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f   343     0   0.01   0.01    4.78    343        0          0     0         0        0
cvc5-1.1.2                   343     0   0.23   0.04   79.25    343        0          0     0         0        0
z3-4.12.6                    343     0   0.03   0.02   10.74    343        0          0     0         0        0
z3-4.13.0                    343     0   0.03   0.02   10.38    343        0          0     0         0        0
z3-noodler-0751e1e-2cddb2f   343     0   0.02   0.02    5.28    343        0          0     0         0        0
--------------------------------------------------

Benchmark denghang/to120.csv
# of formulae: 999
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f   999     0   0.04   0.02   42.76    186      813          0     0         0        0
cvc5-1.1.2                   981    18   0.30   0.00  292.35    169      812          0    18         0        0
z3-4.12.6                    883   116   0.38   0.09  338.60    186      697          0   105        11        0
z3-4.13.0                    883   116   0.39   0.09  342.20    186      697          0   105        11        0
z3-noodler-0751e1e-2cddb2f   999     0   0.04   0.02   44.34    186      813          0     0         0        0
--------------------------------------------------

Benchmark automatark/to120.csv
# of formulae: 15995
--------------------------------------------------
tool                           ✅    ❌    avg    med     time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  -------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  15935    60   0.12   0.01  1986.14  10452     5483          0    60         0        0
cvc5-1.1.2                  15902    93   0.14   0.00  2216.97  10460     5442          0    85         8        0
z3-4.12.6                   15869   126   0.39   0.02  6250.34  10469     5400          0   125         1        0
z3-4.13.0                   15870   125   0.40   0.02  6425.06  10469     5401          0   124         1        0
z3-noodler-0751e1e-2cddb2f  15935    60   0.12   0.02  1975.62  10452     5483          0    60         0        0
--------------------------------------------------

Benchmark stringfuzz/to120.csv
# of formulae: 11618
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  11614     4   0.04   0.01    434.97   6353     5261          2     2         0        0
cvc5-1.1.2                  10915   703   2.81   0.00  30628.85   6158     4757          0   702         1        0
z3-4.12.6                   11078   540   4.09   0.01  45270.19   5992     5086          0   415       125        0
z3-4.13.0                   11081   537   4.14   0.01  45873.06   5995     5086          0   412       125        0
z3-noodler-0751e1e-2cddb2f  11616     2   0.03   0.01    350.71   6355     5261          2     0         0        0
--------------------------------------------------

Benchmark norn/to120.csv
# of formulae: 1027
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  1027     0   0.01   0.01   14.30    712      315          0     0         0        0
cvc5-1.1.2                   943    84   0.30   0.01  279.95    705      238          0    84         0        0
z3-4.12.6                    904   123   0.33   0.02  294.15    712      192          0   123         0        0
z3-4.13.0                    903   124   0.19   0.02  174.62    712      191          0   124         0        0
z3-noodler-0751e1e-2cddb2f  1027     0   0.02   0.01   15.69    712      315          0     0         0        0
--------------------------------------------------

Benchmark slog/to120.csv
# of formulae: 1976
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  1976     0   0.01   0.01   27.63    808     1168          0     0         0        0
cvc5-1.1.2                  1976     0   0.00   0.00    1.10    808     1168          0     0         0        0
z3-4.12.6                   1947    29   0.36   0.01  694.80    779     1168          0    29         0        0
z3-4.13.0                   1945    31   0.25   0.01  488.71    777     1168          0    31         0        0
z3-noodler-0751e1e-2cddb2f  1976     0   0.02   0.01   30.00    808     1168          0     0         0        0
--------------------------------------------------

Benchmark slent/to120.csv
# of formulae: 1128
--------------------------------------------------
tool                          ✅    ❌    avg    med     time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  -------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  1126     2   0.03   0.01    30.81    629      497          1     1         0        0
cvc5-1.1.2                  1104    24   0.98   0.00  1079.50    618      486          0    24         0        0
z3-4.12.6                   1055    73   0.36   0.01   382.24    571      484          0    73         0        0
z3-4.13.0                   1055    73   0.37   0.01   385.56    571      484          0    73         0        0
z3-noodler-0751e1e-2cddb2f  1127     1   0.05   0.01    54.87    629      498          1     0         0        0
--------------------------------------------------

Benchmark transducer_plus/to120.csv
# of formulae: 91
--------------------------------------------------
tool                          ✅    ❌     avg     med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  ------  ------  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f     0    91  nan     nan       0.00      0        0         91     0         0        0
cvc5-1.1.2                    42    49   11.68    3.24  490.70     41        1          0    49         0        0
z3-4.12.6                      1    90    0.02    0.02    0.02      0        1         90     0         0        0
z3-4.13.0                      1    90    0.01    0.01    0.01      0        1         90     0         0        0
z3-noodler-0751e1e-2cddb2f     0    91  nan     nan       0.00      0        0         91     0         0        0
--------------------------------------------------

Benchmark kepler/to120.csv
# of formulae: 587
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f   584     3   0.03   0.01   16.25    284      300          0     3         0        0
cvc5-1.1.2                   347   240   0.00   0.00    0.54     50      297          0   240         0        0
z3-4.12.6                    278   309   0.09   0.02   25.14    135      143          0   285        24        0
z3-4.13.0                    278   309   0.09   0.01   24.64    135      143          0   285        24        0
z3-noodler-0751e1e-2cddb2f   584     3   0.03   0.01   17.44    284      300          0     3         0        0
--------------------------------------------------

Benchmark woorpje/to120.csv
# of formulae: 809
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f   748    61   0.43   0.02  322.01    587      161          0    20        41        0
cvc5-1.1.2                   755    54   0.06   0.01   45.21    591      164          0    54         0        0
z3-4.12.6                    782    27   0.38   0.02  296.29    618      164          0    27         0        0
z3-4.13.0                    782    27   0.38   0.02  300.36    618      164          0    27         0        0
z3-noodler-0751e1e-2cddb2f   750    59   0.58   0.02  438.50    589      161          0    20        39        0
--------------------------------------------------

Benchmark webapp/to120.csv
# of formulae: 681
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f   355   326   0.19   0.02   68.26    130      225        316     6         4        0
cvc5-1.1.2                   588    93   1.47   0.01  864.69    165      423          0    93         0        0
z3-4.12.6                    449   232   0.46   0.02  207.29     35      414        120   112         0        0
z3-4.13.0                    450   231   0.69   0.02  312.63     36      414        120   111         0        0
z3-noodler-0751e1e-2cddb2f   357   324   0.20   0.02   69.90    132      225        316     4         4        0
--------------------------------------------------

Benchmark kaluza/to120.csv
# of formulae: 19432
--------------------------------------------------
tool                           ✅    ❌    avg    med     time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  -------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  19103   329   0.05   0.01  1041.16  15999     3104          0   323         6        0
cvc5-1.1.2                  19431     1   0.05   0.00  1061.97  16302     3129          0     1         0        0
z3-4.12.6                   19148   284   0.11   0.01  2014.35  16032     3116          0   284         0        0
z3-4.13.0                   19148   284   0.11   0.01  2019.12  16032     3116          0   284         0        0
z3-noodler-0751e1e-2cddb2f  19162   270   0.06   0.01  1138.96  16065     3097          0   262         8        0
--------------------------------------------------

Benchmark leetcode/to120.csv
# of formulae: 2652
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  2648     4   0.02   0.01   53.36    864     1784          0     0         4        0
cvc5-1.1.2                  2652     0   0.01   0.00   32.91    867     1785          0     0         0        0
z3-4.12.6                   2652     0   0.02   0.01   42.90    867     1785          0     0         0        0
z3-4.13.0                   2652     0   0.02   0.01   43.44    867     1785          0     0         0        0
z3-noodler-0751e1e-2cddb2f  2648     4   0.02   0.01   57.49    864     1784          0     0         4        0
--------------------------------------------------

Benchmark str_small_rw/to120.csv
# of formulae: 1880
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  1746   134   0.17   0.01  291.85      4     1742        102    22        10        0
cvc5-1.1.2                  1861    19   0.02   0.00   31.09     20     1841          2    17         0        0
z3-4.12.6                   1821    59   0.11   0.01  203.86     25     1796          0    59         0        0
z3-4.13.0                   1821    59   0.12   0.01  214.88     25     1796          0    59         0        0
z3-noodler-0751e1e-2cddb2f  1743   137   0.14   0.01  252.41      4     1739        100    23        14        0
--------------------------------------------------

Benchmark pyex/to120.csv
# of formulae: 23845
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  23705   140   0.20   0.11   4696.42  20331     3374          0   117        23        0
cvc5-1.1.2                  23826    19   0.27   0.07   6384.00  20406     3420          0    19         0        0
z3-4.12.6                   22859   986   2.37   0.12  54241.37  19588     3271          0   986         0        0
z3-4.13.0                   22858   987   2.40   0.12  54817.37  19587     3271          0   987         0        0
z3-noodler-0751e1e-2cddb2f  23751    94   0.24   0.11   5617.14  20380     3371          0    93         1        0
--------------------------------------------------

Benchmark full_str_int/to120.csv
# of formulae: 16968
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-3524a85-2cddb2f  16704   264   0.16   0.01   2626.29   2438    14266        126   137         1        0
cvc5-1.1.2                  16963     5   0.30   0.01   5095.10   2521    14442          0     5         0        0
z3-4.12.6                   16730   238   1.20   0.01  19997.34   2474    14256          0   238         0        0
z3-4.13.0                   16729   239   1.21   0.01  20278.49   2473    14256          0   239         0        0
z3-noodler-0751e1e-2cddb2f  16704   264   0.19   0.01   3215.44   2436    14268        126   138         0        0
--------------------------------------------------
jurajsic commented 7 months ago

There is degradation in pyex and kaluza. Possible culprit is the new rewriter rules, or just pure randomness.

vhavlena commented 7 months ago

Can you post the scatter plots as well?

jurajsic commented 6 months ago

I commented one of the new rewriter rules, it was adding some length constraints which probably made some vars length-aware. Pyex now:

Benchmark pyex/to120.csv
# of formulae: 23845
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-3233733-2cddb2f  23752    93   0.20   0.10   4805.20  20381     3371          0    91         2        0
cvc5-1.1.2                  23826    19   0.27   0.07   6384.00  20406     3420          0    19         0        0
z3-4.12.6                   22859   986   2.37   0.12  54241.37  19588     3271          0   986         0        0
z3-4.13.0                   22858   987   2.40   0.12  54817.37  19587     3271          0   987         0        0
z3-noodler-0751e1e-2cddb2f  23751    94   0.24   0.11   5617.14  20380     3371          0    93         1        0
--------------------------------------------------
jurajsic commented 6 months ago

There is one incorrect result in full_str_int now, on QF_SLIA/2019-full_str_int/py-conbyte_trauc/leetcode_int-numDecodings/130.smt2 we give sat, while cvc5 an Z3 give unsat.

jurajsic commented 6 months ago

image On the left is v1.1.

jurajsic commented 6 months ago

I updated mata, we do not give incorrect result anymore (instead we give unknown). It is impossible to debug, so I would merge it and hope that if the bug shows up again, it will be more debuggable.