VeriFIT / z3-noodler

The Z3-Noodler String Solver
Other
9 stars 5 forks source link

Update to newest mata #98

Closed jurajsic closed 1 year ago

vhavlena commented 1 year ago

@jurajsic Can you run the experiments in order to see the impact of the newest mata? Also we should check if there are some errors/bugs as it might happen during the optimizations in mata.

jurajsic commented 1 year ago

Without pyex:

# of formulae: 76186
###################################################################################
####                                   Table 1                                 ####
###################################################################################
| method                     |    max |     mean |   median |   std. dev |   timeouts/errors |   unknowns |
|----------------------------|--------|----------|----------|------------|-------------------|------------|
| z3-noodler-cf573c7-40ca1cd | 109.84 | 0.10667  |     0.02 |    1.82288 |               635 |       7582 |
| cvc5-1.0.8                 | 119.36 | 0.537363 |     0.02 |    5.33357 |              1738 |          2 |
| z3-4.12.2                  | 118.32 | 1.08797  |     0.03 |    7.16023 |              1786 |        213 |
| z3str4                     | 116.06 | 0.225885 |     0.01 |    2.31374 |               936 |       1259 |
| z3-noodler-a0e3745-8ba88ce | 119.36 | 0.104793 |     0.03 |    1.70631 |               623 |       7582 |

image image image

vhavlena commented 1 year ago

On automatark we now have slightly worse results. Do you have any idea why it is so?

jurajsic commented 1 year ago

I think concatenation might be problematic.

jurajsic commented 1 year ago

These ones are solved in devel, but now we give timeout in automatark:

,,name,z3-noodler-cf573c7-40ca1cd-result,cvc5-1.0.8-result,z3-4.12.2-result,z3str4-result,z3-noodler-381ffd1-2f0ef53-result,z3-noodler-nielsen-0746e08-8ba88ce-result,z3-noodler-underapprox-ab28d76-2f0ef53-result,z3-noodler-63aa361-8ba88ce-result,z3-noodler-a0e3745-8ba88ce-result
automatark/to120.csv,6296,../formulae/QF_S/20230329-automatark-lu/instance06296.smt2,TO, unsat, unsat, unsat, unsat, unsat, unsat, unsat, unsat
automatark/to120.csv,7064,../formulae/QF_S/20230329-automatark-lu/instance07064.smt2,TO, sat, sat, sat, sat, sat, sat, sat, sat
automatark/to120.csv,7630,../formulae/QF_S/20230329-automatark-lu/instance07630.smt2,TO, sat, sat, sat, sat, sat, sat, sat, sat
automatark/to120.csv,12516,../formulae/QF_S/20230329-automatark-lu/instance12516.smt2,TO, unsat, unsat, unsat, unsat, unsat, unsat, unsat, unsat
automatark/to120.csv,13315,../formulae/QF_S/20230329-automatark-lu/instance13315.smt2,TO, sat, sat, sat, sat, sat, sat, sat, sat
automatark/to120.csv,13934,../formulae/QF_S/20230329-automatark-lu/instance13934.smt2,TO, sat, sat, sat, sat, sat, sat, sat, sat
automatark/to120.csv,14468,../formulae/QF_S/20230329-automatark-lu/instance14468.smt2,TO, sat, sat, sat, sat, sat, sat, sat, sat

And the scatterplot for automatark: image Seems there is a slight regression.

jurajsic commented 1 year ago

Current version compared with devel on everything without pyex:

# of formulae: 76186
###################################################################################
####                                   Table 1                                 ####
###################################################################################
| method                     |    max |     mean |   median |   std. dev |   timeouts/errors |   unknowns |
|----------------------------|--------|----------|----------|------------|-------------------|------------|
| z3-noodler-0ae38ae-40ca1cd | 119.82 | 0.110545 |     0.02 |    2.08484 |               623 |       7582 |
| cvc5-1.0.8                 | 119.36 | 0.537363 |     0.02 |    5.33357 |              1738 |          2 |
| z3-4.12.2                  | 118.32 | 1.08797  |     0.03 |    7.16023 |              1786 |        213 |
| z3str4                     | 116.06 | 0.225885 |     0.01 |    2.31374 |               936 |       1259 |
| ostrich-70d01e2d2          | 376.3  | 8.15053  |     2.52 |   16.0862  |              2634 |          1 |
| z3strRE                    | 119.59 | 0.175511 |     0.01 |    2.6927  |              1600 |       3693 |
| z3-trau-1.1                | 119.52 | 1.02905  |     0.03 |    6.72386 |              9289 |         53 |
| z3-noodler-a0e3745-8ba88ce | 119.36 | 0.104793 |     0.03 |    1.70631 |               623 |       7582 |

image

vhavlena commented 1 year ago

Ok, there are still problems with some instances in automatark where we have worse results compared to the previous version of mata. But I would skip it for now. We have more urgent things to do I think.

Adda0 commented 1 year ago

What caused the problem with timeouts? Otherwise, I agree that this seems as a satisfiable result for now.

jurajsic commented 1 year ago

pyex:

# of formulae: 23845
###################################################################################
####                                   Table 1                                 ####
###################################################################################
| method                     |    max |      mean |   median |   std. dev |   TO+MO+ERR |   unknowns |
|----------------------------|--------|-----------|----------|------------|-------------|------------|
| z3-noodler-0ae38ae-40ca1cd | 119.83 |  3.53156  |     0.24 |   13.1413  |        3694 |        709 |
| cvc5-1.0.8                 | 112    |  0.321931 |     0.13 |    2.59283 |          34 |          0 |
| z3-4.12.2                  | 119.29 |  2.76938  |     0.24 |   10.5431  |        1071 |          0 |
| z3str4                     | 118.44 |  1.45473  |     0.18 |    6.4344  |         444 |        126 |
| ostrich-70d01e2d2          | 912.88 | 55.79     |    45.48 |   81.1718  |       12247 |          0 |
| z3strRE                    |  94.41 |  0.663989 |     0.04 |    4.21008 |          61 |      17703 |
| z3-trau-1.1                | 106.72 |  0.478995 |     0.08 |    2.47854 |          11 |          0 |
| z3-noodler-a0e3745-8ba88ce | 119.79 |  3.41622  |     0.22 |   12.8164  |        3679 |        714 |

image

jurajsic commented 1 year ago

What caused the problem with timeouts? Otherwise, I agree that this seems as a satisfiable result for now.

The result of concatenation can now contain unreachable states, so we call trim after. But I think there are still some calls in noodler to concatenation that are not followed with trim, we should check it.

jurajsic commented 1 year ago

I think we can merge this, is everyone ok with that?

Adda0 commented 1 year ago

I would merge it.