Length decision procedure

vhavlena commented 4 months ago

This PR introduces the length decision procedure implemented by @xhrani03. I have made some improvements extending the possible fragment. The length-based procedure is off by default (it can be turned on by smt.str.try_length_proc=true.

vhavlena commented 4 months ago

Well, it seems there is a serious performance degradation, compared to devel

# of formulae: 103318
--------------------------------------------------
tool                                                 ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
-----------------------------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-1966907-d95fe13                       101850   1468   16276.20   0.16   0.02   2.06  62516    39334        631    784        53        0
z3-noodler-6802421-d95fe13                       102191   1127   12819.51   0.13   0.01   1.59  62783    39408        639    437        51        0
cvc5-1.1.2                                        99699   3619   76478.24   0.77   0.00   6.77  60798    38901          2   3608         9        0
z3-4.13.0                                         97687   5631  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461        0
ostrich-5dd2e10ca                                 86344  16974  782639.00   9.06   1.75  19.20  47556    38788          0  16962         0       12
z3-4.13.0+cvc5-1.1.2                             100756   2562   60604.78   0.60   0.00   6.00  61099    39657       2562      0         0        0
z3-noodler-1966907-d95fe13+cvc5-1.1.2            102836    482   13260.38   0.13   0.00   2.24  62997    39839        482      0         0        0
z3-noodler-1966907-d95fe13+z3-4.13.0             102441    877   13564.62   0.13   0.01   1.91  62689    39752        877      0         0        0
z3-noodler-1966907-d95fe13+z3-4.13.0+cvc5-1.1.2  102849    469   12718.41   0.12   0.00   2.26  63006    39843        469      0         0        0
--------------------------------------------------

There are also quite many (77) instances in stringfuzz where the length-based procedure gives incorrect result.

vhavlena commented 4 months ago

The problem is caused by the redos benchmark. Probably in the suitability function of the length-based procedure.

vhavlena commented 4 months ago

Results of the fixed version. It helps a bit.

# of formulae: 103318
--------------------------------------------------
tool                                                 ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
-----------------------------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-52a04e9-d95fe13                       102201   1117   12606.31   0.12   0.01   1.53  62784    39417        632    434        51        0
z3-noodler-6802421-d95fe13                       102191   1127   12819.51   0.13   0.01   1.59  62783    39408        639    437        51        0
cvc5-1.1.2                                        99699   3619   76478.24   0.77   0.00   6.77  60798    38901          2   3608         9        0
z3-4.13.0                                         97687   5631  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461        0
ostrich-5dd2e10ca                                 86344  16974  782639.00   9.06   1.75  19.20  47556    38788          0  16962         0       12
z3-4.13.0+cvc5-1.1.2                             100756   2562   60604.78   0.60   0.00   6.00  61099    39657       2562      0         0        0
z3-noodler-52a04e9-d95fe13+cvc5-1.1.2            103176    142    9180.41   0.09   0.00   1.35  63260    39916        142      0         0        0
z3-noodler-52a04e9-d95fe13+z3-4.13.0             102787    531    9548.52   0.09   0.01   0.96  62958    39829        531      0         0        0
z3-noodler-52a04e9-d95fe13+z3-4.13.0+cvc5-1.1.2  103188    130    8652.20   0.08   0.00   1.38  63268    39920        130      0         0        0
--------------------------------------------------

jurajsic commented 4 months ago

Are you comparing with the correct version? I think it should be z3-noodler-82db97c-d95fe13. Maybe a good idea would be to rebase/merge with devel and run it again and compare then compare it with z3-noodler-b7b8ad5-d95fe13.

vhavlena commented 4 months ago

I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.

jurajsic commented 4 months ago

I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.

Are you sure? According to #140, z3-noodler-82db97c-d95fe13 was the last commit. And comparing those two:

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-52a04e9-d95fe13  102201  1117   12606.31   0.12   0.01   1.53  62784    39417        632   434        51        0
z3-noodler-82db97c-d95fe13  102200  1118   12913.53   0.13   0.01   1.64  62783    39417        632   434        52        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

The results are not that interesting, it is a bit faster with one more solved instance.

vhavlena commented 4 months ago

I am comparing with the last commit before the SNIA support PR. There should not be much difference. But sure, I can rebase it.

Are you sure? According to #140, z3-noodler-82db97c-d95fe13 was the last commit. And comparing those two:

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-52a04e9-d95fe13  102201  1117   12606.31   0.12   0.01   1.53  62784    39417        632   434        51        0
z3-noodler-82db97c-d95fe13  102200  1118   12913.53   0.13   0.01   1.64  62783    39417        632   434        52        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

The results are not that interesting, it is a bit faster with one more solved instance.

Yes, you are right; z3-noodler-82db97c-d95fe13 seems to be the relevant version.

vhavlena commented 2 months ago

Current results:

# of formulae: 103405
--------------------------------------------------
tool                                   ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
---------------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-length-de9f98c-d95fe13  102468   937   11627.25   0.11   0.02   1.58  63039    39429        651   230        56        0
z3-noodler-a8ca33a-d95fe13         102338  1067   12090.90   0.12   0.02   1.67  62907    39431        625   389        53        0
cvc5-1.1.2                          99774  3631   76479.19   0.77   0.00   6.77  60870    38904          2  3620         9        0
z3-4.13.0                           97694  5711  132044.66   1.35   0.01   8.11  58876    38818        210  4968       463       70
--------------------------------------------------

There is more unknowns (the procedure may introduce unknowns), but quite significantly reduces the number of timeouts.

The results on kaluza

# of formulae: 19432
--------------------------------------------------
tool                                  ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
---------------------------------  -----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-length-de9f98c-d95fe13  19341    91   820.31   0.04   0.01   1.50  16243     3098         26    48        17        0
z3-noodler-a8ca33a-d95fe13         19211   221  1063.08   0.06   0.01   1.44  16112     3099          0   205        16        0
cvc5-1.1.2                         19431     1  1061.97   0.05   0.00   1.87  16302     3129          0     1         0        0
z3-4.13.0                          19148   284  2019.12   0.11   0.01   2.02  16032     3116          0   284         0        0
--------------------------------------------------

vhavlena commented 2 months ago

@jurajsic Can you refresh your review? I have made some effort to make it more merge-ready.

VeriFIT / z3-noodler

Length decision procedure #143