CozySynthesizer / cozy

The collection synthesizer
https://cozy.uwplse.org
Apache License 2.0
209 stars 18 forks source link

add simple cache to cost model #81

Closed izgzhen closed 5 years ago

izgzhen commented 6 years ago

PoC

izgzhen commented 6 years ago

this very simple implementation can have quite good hit rate:

0.0
0.0
0.0
0.0
0.2
0.16666666666666666
0.2857142857142857
0.375
0.4444444444444444
0.5
0.5454545454545454
0.5833333333333334
0.6153846153846154
0.6428571428571429
0.6666666666666666
0.6875
0.7058823529411765
0.7222222222222222
0.7368421052631579
0.75
0.7619047619047619
0.7727272727272727
0.782608695652174
0.7916666666666666
0.76
0.7307692307692307
0.7037037037037037
0.7142857142857143
0.6896551724137931
0.7
0.6774193548387096
0.6875
0.696969696969697
0.7058823529411765
0.7142857142857143
0.7222222222222222
0.7027027027027027
0.7105263157894737
0.717948717948718
0.725
0.7317073170731707
0.7380952380952381
0.7441860465116279

show by running at the same time watch -n 1 cat /tmp/cozy_cache*

izgzhen commented 6 years ago

but the total numbers for a query is actually low, caused by insufficient sharing between each cost_model instance I guess. I am not sure it is safe to share, but sharing should be a simple hack

izgzhen commented 6 years ago

also, I felt like cozy becomes faster with this. However, there is not strict validation yet

izgzhen commented 6 years ago

lru_cache from functools also works

izgzhen commented 6 years ago

I added a memcached version -- yeah I am kinda crazy

izgzhen commented 6 years ago

after a few warmup for memcached, cache hit rate for all cost_model instances mostly are 80% and up

Calvin-L commented 6 years ago

Sharing a cache between cost model instances is dangerous since there are member variables on each cost model that affect its behavior. Where you inserted the cache, I think that includes only self.assumptions and self.funcs. If you want a global cache you will need to include that information in the cache key as well.

izgzhen commented 6 years ago

Sharing a cache between cost model instances is dangerous since there are member variables on each cost model that affect its behavior. Where you inserted the cache, I think that includes only self.assumptions and self.funcs. If you want a global cache you will need to include that information in the cache key as well.

Good point, we need to figure out what to share and what not to share later. If the acceleration is significant, it might be worthy to structure cozy's code to make the correctness property obvious.

Calvin-L commented 6 years ago

CostModel._compare is a pure function of four inputs---just think of the self argument as a real input of equal importance to the others.

The cost model could be a plain old function of two expressions and many other inputs, but it helps efficiency a lot to do some up-front setup using the inputs that don't change very much. In particular, sending assumptions to the solver once and re-using the same solver over and over provides massive speedup.

This design is a tradeoff in code clarity: the cost model code itself is more difficult to follow, but client code is easier to read since all the extra parameters that the cost model needs have been abstracted away.