CozySynthesizer / cozy

The collection synthesizer
https://cozy.uwplse.org
Apache License 2.0
209 stars 18 forks source link

add heap_elems and heap_length argument to EHeapElems #61

Closed izgzhen closed 5 years ago

izgzhen commented 5 years ago

WIP

izgzhen commented 5 years ago

Currently buggy for cozy examples/maxbag.ds --java examples/MaxBag.java:

Traceback (most recent call last):
  File "/home/zgzhen/repos/cozy/cozy/jobs.py", line 29, in _run
    self.run()
  File "/home/zgzhen/repos/cozy/cozy/synthesis/high_level_interface.py", line 85, in run
    ops=self.ops)):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/core.py", line 194, in improve
    blacklist=blacklist):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/core.py", line 349, in search_for_improvements
    for info in enum.enumerate_with_info(size=size, context=ctx, pool=pool):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 580, in enumerate_with_info
    yield from self._enumerate_with_info(context, size, pool)
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 603, in _enumerate_with_info
    e = next(queue)
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 506, in _enumerate_core
    for f in build_lambdas(e1_singleton, pool, sz2):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 393, in build_lambdas
    for lam_body in self.enumerate(inner_context, body_size, pool):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 537, in enumerate
    for info in self.enumerate_with_info(context, size, pool):
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 580, in enumerate_with_info
    yield from self._enumerate_with_info(context, size, pool)
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 634, in _enumerate_with_info
    to_keep = eviction_policy(e, context, prev_exp, context, pool, cost_model)
  File "/home/zgzhen/repos/cozy/cozy/synthesis/enumeration.py", line 237, in eviction_policy
    ordering = cost_model.compare(new_exp, old_exp, context, pool)
  File "/home/zgzhen/repos/cozy/cozy/cost_model.py", line 196, in compare
    lambda: order_objects(e1.size(), e2.size()))
  File "/home/zgzhen/repos/cozy/cozy/cost_model.py", line 66, in prioritized_order
    o = f()
  File "/home/zgzhen/repos/cozy/cozy/cost_model.py", line 195, in <lambda>
    lambda: self._compare(storage_size(e1, self.freebies), storage_size(e2, self.freebies), context),
  File "/home/zgzhen/repos/cozy/cozy/cost_model.py", line 229, in storage_size
    return h.storage_size(e, storage_size=storage_size)
  File "/home/zgzhen/repos/cozy/cozy/structures/heaps.py", line 195, in storage_size
    return storage_size(heap_elems(e).with_type(TBag(e.type.elem_type)))
  File "/home/zgzhen/repos/cozy/cozy/structures/heaps.py", line 90, in heap_elems
    raise NotImplementedError(repr(e))
NotImplementedError: EVar('_var77974').with_type(TMaxHeap(TInt(), TInt()))
izgzhen commented 5 years ago

some related comments: https://github.com/izgzhen/cozy/commit/4afa4c9bc86ab2d847c1d4e9c36bb1665dce6ed3

izgzhen commented 5 years ago

for the exception at hand, it is related to storage_size. I think I am stuck now.

izgzhen commented 5 years ago

squash and updated unit tests

izgzhen commented 5 years ago

it seems like unit test failed this time because we can't generate code for TMinHeap (and probably for other similar constructs as well)

izgzhen commented 5 years ago

@Calvin-L I think it is not trivial to fix the unit test -- how did you generate the expressions in test?

Calvin-L commented 5 years ago

The big expressions in the test code were copy-pasted from failing cases I encountered "in the wild". That habit led to a lot of useful regression tests, but makes the tests difficult to maintain. My advice is:

izgzhen commented 5 years ago

I added a ZERO-valued argument to EHeapElems in tests, and it works.

The unit test snippet is too large for me to comprehend, and I can't add anything else except for ZERO.

izgzhen commented 5 years ago

I think we need to confirm the following things before merging this:

izgzhen commented 5 years ago

Also, I am not confident about the fact that filling the second parameter with ZERO is the right thing to do

izgzhen commented 5 years ago

encode doesn't use the second argument of new EHeapElems

Calvin-L commented 5 years ago

I added a ZERO-valued argument to EHeapElems in tests, and it works. Also, I am not confident about the fact that filling the second parameter with ZERO is the right thing to do

I have not confirmed, but I think the test only verifies that it can generate code that compiles, not that the generated code actually works. So, using zero produces compile-able code, but it would not function.

heap_func and my heap_elems are partial -- there is NotImplementedError inside it.

This is fine for now.

Is the object returned from EMakeMaxHeap/EMakeMinHeap a "heap" or "Heap array"?

It is both. It is a heap, and at code generation time, heaps are represented with arrays.

For the unit tests, we might need docstring for how to maintain them and what does check_encode etc. implies.

I agree!

encode doesn't use the second argument of new EHeapElems

That is fine. encode is for the semantics of heaps, which don't really care that they happen to be represented as arrays.

izgzhen commented 5 years ago

I have not confirmed, but I think the test only verifies that it can generate code that compiles, not that the generated code actually works. So, using zero produces compile-able code, but it would not function.

I suspect so as well. However, I figured that even if I crashed the generated code, I don't know what might be the correct value to fill in either (not apparent at all!).

izgzhen commented 5 years ago

let me try to construct another test with EHeapElems for codegen.py

izgzhen commented 5 years ago

@Calvin-L I've replaced the original regression01 with synthesized result from maxbag

izgzhen commented 5 years ago

@Calvin-L I believe this PR is ready for merging now (after build), though I have a few more follow-up work in mind.

izgzhen commented 5 years ago

we might need two versions of heap_elems -- one for code-gen purpose, one for solver/eval purpose

Calvin-L commented 5 years ago

We already have two versions. heap_elems computes the set of elements for internal purposes. EHeapElems produces an expression for code generation.

izgzhen commented 5 years ago

I reverted the heap_elems hack and revised check_wf

izgzhen commented 5 years ago

it should be easy to just check out this PR and reproduce the storage_size error by synthesizing maxbag example

Calvin-L commented 5 years ago

Cozy is nondeterministic. Whether it happens to discover a particular query or solution depends on how fast your computer runs and how many threads it can support efficiently. I was not able to discover the error after several tries. If you see it consistently, could you send me the log file from /tmp/ that throws the error? Those files include deterministic calls to improve that can be reproduced on other machines.

izgzhen commented 5 years ago

_query6116.log

izgzhen commented 5 years ago

@Calvin-L this is the log for the first failing query:

Traceback (most recent call last):
  File "/home/zgzhen/projects/cozy/cozy/jobs.py", line 29, in _run
    self.run()
  File "/home/zgzhen/projects/cozy/cozy/synthesis/high_level_interface.py", line 85, in run
    ops=self.ops)):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/core.py", line 196, in improve
    blacklist=blacklist):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/core.py", line 351, in search_for_improvements
    for info in enum.enumerate_with_info(size=size, context=ctx, pool=pool):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 580, in enumerate_with_info
    yield from self._enumerate_with_info(context, size, pool)
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 603, in _enumerate_with_info
    e = next(queue)
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 506, in _enumerate_core
    for f in build_lambdas(e1_singleton, pool, sz2):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 393, in build_lambdas
    for lam_body in self.enumerate(inner_context, body_size, pool):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 537, in enumerate
    for info in self.enumerate_with_info(context, size, pool):
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 580, in enumerate_with_info
    yield from self._enumerate_with_info(context, size, pool)
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 634, in _enumerate_with_info
    to_keep = eviction_policy(e, context, prev_exp, context, pool, cost_model)
  File "/home/zgzhen/projects/cozy/cozy/synthesis/enumeration.py", line 237, in eviction_policy
    ordering = cost_model.compare(new_exp, old_exp, context, pool)
  File "/home/zgzhen/projects/cozy/cozy/cost_model.py", line 196, in compare
    lambda: order_objects(e1.size(), e2.size()))
  File "/home/zgzhen/projects/cozy/cozy/cost_model.py", line 66, in prioritized_order
    o = f()
  File "/home/zgzhen/projects/cozy/cozy/cost_model.py", line 195, in <lambda>
    lambda: self._compare(storage_size(e1, self.freebies), storage_size(e2, self.freebies), context),
  File "/home/zgzhen/projects/cozy/cozy/cost_model.py", line 229, in storage_size
    return h.storage_size(e, storage_size=storage_size)
  File "/home/zgzhen/projects/cozy/cozy/structures/heaps.py", line 195, in storage_size
    return storage_size(heap_elems(e).with_type(TBag(e.type.elem_type)))
  File "/home/zgzhen/projects/cozy/cozy/structures/heaps.py", line 90, in heap_elems
    raise NotImplementedError(repr(e))
NotImplementedError: EVar('_var78661').with_type(TMaxHeap(TInt(), TInt()))
Calvin-L commented 5 years ago

Great, I had no trouble reproducing the issue! The situation is a little complicated, but I'll try to explain.

During synthesis, Cozy needs to enumerate let-expressions. It does so by recursively enumerating all expressions in a world where there is an extra variable; for instance, if we are building a let-expression like let h = MakeHeap(...) in ___, Cozy enumerates expressions with a free heap-type variable h.

While doing the recursive enumeration, Cozy also needs to estimate costs for expressions. That is how we get down the path to calling storage_size when there is a free heap-type variable with no information about it.

For let-expressions it is clear what we can do: pass information down so that storage_size can see all let-bound variables. Unfortunately that won't work here, because Cozy also uses this same mechanism to build lambda-expressions for functional maps and filters. In that case there there is no one value for the free variable; it is any member of a given collection, as in Map(bag_of_heaps, \h -> ___).

It is clear to me that heap_elems is probably the wrong design. Here are some possible alternatives I am considering:

izgzhen commented 5 years ago

Enumerating elements of a heap will not be an operation we support, in which case EHeapElems should be a "specification-only" primitive that does not take a length parameter.

Heap is internal to synthesizer now -- do you mean EHeapElems should be solver/eval-only, and never leaked out to code-generation stage?

izgzhen commented 5 years ago

let's close this PR for now and move to the issue for discussing the new proposals :)

Calvin-L commented 5 years ago

Yes, I do mean that EHeapElems would be solver/eval only, and should never appear in final code.

"Specification-only" includes user-written specifications and Cozy's own internal specifications for new subgoals it creates. It might also include concretization functions.