Add support for QF_SNIA

jurajsic commented 4 months ago

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13  102191  1127   14077.39   0.14   0.02   1.52  62776    39415        628   453        46        0
z3-noodler-6802421-d95fe13  102191  1127   12819.51   0.13   0.01   1.59  62783    39408        639   437        51        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

There is slight degradation for the time of solving, but otherwise it looks like there are no problems.

jurajsic commented 4 months ago

For QF_SNIA formulae:

Benchmark snia
# of formulae: 70
--------------------------------------------------
tool                          ✅    ❌    time     avg     med     std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  ------  ------  ------  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13    70     0    0.98    0.01    0.01    0.01     70        0          0     0         0        0
cvc5-1.1.2                    70     0    0.00    0.00    0.00    0.00     70        0          0     0         0        0
z3-4.13.0                      0    70    0.00  nan     nan     nan         0        0          0     0         0       70
ostrich-e386836db             70     0  119.74    1.71    1.70    0.20     70        0          0     0         0        0
--------------------------------------------------

Z3 returns correctly sat for all 70 instances, but also prints some error, so it does not show up correctly in the table.

vhavlena commented 4 months ago

How come we have now less unknowns but more TOs than before?

jurajsic commented 4 months ago

Benchmark str_small_rw
# of formulae: 1880
--------------------------------------------------
tool                          ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13  1762   118   341.40   0.19   0.01   2.75      4     1758         92    17         9        0
z3-noodler-6802421-d95fe13  1752   128   399.87   0.23   0.01   3.38      5     1747        103    13        12        0
cvc5-1.1.2                  1861    19    31.09   0.02   0.00   0.56     20     1841          2    17         0        0
z3-4.13.0                   1821    59   214.88   0.12   0.01   2.86     25     1796          0    59         0        0
ostrich-e386836db           1709   171  7478.26   4.38   1.63   6.54     19     1690          0   171         0        0
--------------------------------------------------

Benchmark pyex
# of formulae: 23845
--------------------------------------------------
tool                           ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  -----  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-f552b5f-d95fe13  23747     98    4970.47   0.21   0.11   1.56  20377     3370          0     96         2        0
z3-noodler-6802421-d95fe13  23754     91    4734.25   0.20   0.10   1.84  20383     3371          0     89         2        0
cvc5-1.1.2                  23826     19    6384.00   0.27   0.07   2.48  20406     3420          0     19         0        0
z3-4.13.0                   22858    987   54817.37   2.40   0.12  10.23  19587     3271          0    987         0        0
ostrich-e386836db           11017  12828  353972.36  32.13  20.71  35.14   7801     3216          0  12828         0        0
--------------------------------------------------

Benchmark woorpje
# of formulae: 809
--------------------------------------------------
tool                          ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-f552b5f-d95fe13   771    38   174.18   0.23   0.02   2.30    609      162          0    21        17        0
z3-noodler-6802421-d95fe13   774    35   252.34   0.33   0.01   4.49    612      162          0    17        18        0
cvc5-1.1.2                   755    54    45.21   0.06   0.01   0.23    591      164          0    54         0        0
z3-4.13.0                    782    27   300.36   0.38   0.02   4.80    618      164          0    27         0        0
ostrich-e386836db            755    54  6876.45   9.11   4.60  15.58    589      166          0    51         0        3
--------------------------------------------------

This three benchmarks have the most different number of solved instances, I would say it's just Z3 randomness. Here is also scatter plot:

vhavlena commented 4 months ago

You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)

jurajsic commented 4 months ago

You are comparing with a wrong version o Z3-Noodler (you are using the version with length decision procedure)

Are you sure? According to list_of_tools.txt I am comparing with the version from #140.

jurajsic commented 4 months ago

Hmm, it seems that 6802421 is older than 82db97c in #140, here is comparison with the newer one:

# of formulae: 103318
--------------------------------------------------
tool                            ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-f552b5f-d95fe13  102191   1127   14077.39   0.14   0.02   1.52  62776    39415        628    453        46        0
z3-noodler-82db97c-d95fe13  102200   1118   12913.53   0.13   0.01   1.64  62783    39417        632    434        52        0
cvc5-1.1.2                   99699   3619   76478.24   0.77   0.00   6.77  60798    38901          2   3608         9        0
z3-4.13.0                    97687   5631  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461        0
ostrich-e386836db            84819  18499  674023.96   7.95   1.62  17.80  46667    38152          0  15197         0     3302
--------------------------------------------------

jurajsic commented 4 months ago

I turned off other (unneeded) theories again, like before, the results:

# of formulae: 103318
--------------------------------------------------
tool                            ✅    ❌       time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ------  ----  ---------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-b7b8ad5-d95fe13  102204  1114   12937.44   0.13   0.01   1.63  62783    39421        628   435        51        0
z3-noodler-82db97c-d95fe13  102200  1118   12913.53   0.13   0.01   1.64  62783    39417        632   434        52        0
cvc5-1.1.2                   99699  3619   76478.24   0.77   0.00   6.77  60798    38901          2  3608         9        0
z3-4.13.0                    97687  5631  132036.99   1.35   0.01   8.11  58874    38813        210  4960       461        0
--------------------------------------------------

QF_SNIA:

# of formulae: 70
--------------------------------------------------
tool                          ✅    ❌    time     avg     med     std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  ------  ------  ------  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-b7b8ad5-d95fe13    70     0    0.82    0.01    0.01    0.01     70        0          0     0         0        0
cvc5-1.1.2                    70     0    0.00    0.00    0.00    0.00     70        0          0     0         0        0
z3-4.13.0                      0    70    0.00  nan     nan     nan         0        0          0     0         0       70
--------------------------------------------------

VeriFIT / z3-noodler

Add support for QF_SNIA #144