CozySynthesizer / cozy

The collection synthesizer
https://cozy.uwplse.org
Apache License 2.0
209 stars 18 forks source link

select-flatmap synthesis is slow #108

Open izgzhen opened 4 years ago

izgzhen commented 4 years ago

https://github.com/CozySynthesizer/cozy/blob/master/examples/select-flatmap.ds

izgzhen commented 4 years ago

solver._convert(EUnaryOp('not', EBinOp(EFlatMap(EStateVar(EVar('Rs').with_type(TBag(TRecord((('A', TInt()), ('B', TString())))))).with_type(TBag(TRecord((('A', TInt()), ('B', TString()))))), ELambda(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), EFlatMap(EStateVar(EVar('Ss').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('s').with_type(TRecord((('B', TString()), ('C', TInt())))), EFlatMap(EStateVar(EVar('Qs').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), EMap(EFilter(EStateVar(EVar('Ws').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), EBinOp(EBinOp(EGetField(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString()), '==', EGetField(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString())).with_type(TBool()), 'and', EBinOp(EGetField(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), 'A').with_type(TInt()), '==', ENum(15).with_type(TInt())).with_type(TBool())).with_type(TBool()))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), ETuple((EGetField(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), 'A').with_type(TInt()), EGetField(EVar('s').with_type(TRecord((('B', TString()), ('C', TInt())))), 'C').with_type(TInt()), EGetField(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString()), EGetField(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), 'C').with_type(TInt()))).with_type(TTuple((TInt(), TInt(), TString(), TInt()))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))), '==', EFlatMap(EEmptyList().with_type(TBag(TRecord((('A', TInt()), ('B', TString()))))), ELambda(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), EFlatMap(EStateVar(EVar('Ss').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('s').with_type(TRecord((('B', TString()), ('C', TInt())))), EFlatMap(EStateVar(EVar('Qs').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), EMap(EFilter(EStateVar(EVar('Ws').with_type(TBag(TRecord((('B', TString()), ('C', TInt())))))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), EBinOp(EBinOp(EGetField(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString()), '==', EGetField(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString())).with_type(TBool()), 'and', EBinOp(EGetField(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), 'A').with_type(TInt()), '==', ENum(15).with_type(TInt())).with_type(TBool())).with_type(TBool()))).with_type(TBag(TRecord((('B', TString()), ('C', TInt()))))), ELambda(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), ETuple((EGetField(EVar('r').with_type(TRecord((('A', TInt()), ('B', TString())))), 'A').with_type(TInt()), EGetField(EVar('s').with_type(TRecord((('B', TString()), ('C', TInt())))), 'C').with_type(TInt()), EGetField(EVar('q').with_type(TRecord((('B', TString()), ('C', TInt())))), 'B').with_type(TString()), EGetField(EVar('w').with_type(TRecord((('B', TString()), ('C', TInt())))), 'C').with_type(TInt()))).with_type(TTuple((TInt(), TInt(), TString(), TInt()))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt())))))).with_type(TBag(TTuple((TInt(), TInt(), TString(), TInt()))))).with_type(TBool())).with_type(TBool()))

izgzhen commented 4 years ago

It seems that comparing the equality between to Bags returned from flatMap is expensive, since the equality test is based on the element count test, and each count is computed by summing all equal elements in the Bag, which in turns recursively invokes the solver.

izgzhen commented 4 years ago

I am working on "evaluation based refutation speedup" in https://github.com/CozySynthesizer/cozy/tree/eval-refutation. Still WIP.

Calvin-L commented 4 years ago

I thought of a possible solution to this while I was working on #110 and #119.

Z3 offers a "constant array" primitive that lets you make a symbolic array where all the values are the same constant. We can use that primitive to encode sets and bags as quantifier-free symbolic terms for Z3, reducing encoding times dramatically. To sketch out an example:

encode(empty bag of objects) = constant-array(domain=object, value=0)
encode(union(bag, {elem})) = store(encode(bag), elem, select(bag, elem) + 1)

With bags encoded that way, bag equality is a simple equality formula for Z3.

For the FlatMap/equality example I am working on, we can get encoding time down from 6 minutes to under 1 second.

(Curious aside: the constant array primitive is not part of the SMT-LIB standard, and there are deep model-theoretic reasons why its inclusion is actually a violation of the standard. However, it is very useful and will probably not be dropped from the Z3 API.)

Unfortunately, even with lightning-fast encoding, Z3 still takes a while to solve my example formula (about 5 minutes on my laptop). Also, while the new encoding strategy is asymptotically faster, it introduces a constant 50% overhead on the solver tests.

I am looking into whether I can mitigate the long Z3 solving time somehow.

Calvin-L commented 4 years ago

I am inclined to open a PR even if I can't fix the Z3 solving time. Solving time without the new encoding is >20 minutes. (I wasn't willing to wait longer and killed the job.)