Open ricochet1k opened 4 years ago
Thanks for reaching out!
You are right, basic.ds
is a terrible first example. Cozy's behavior on that simple input is quite surprising. (However, it isn't totally unreasonable; see below.) I will think about what to do, and respond to some of your other points in the meantime.
Cozy seems to make it much much worse
Cozy always prefers to optimize query performance over update performance. Cozy has found the fastest possible implementation of elems()
, and it is struggling to improve remove()
.
In that light, what Cozy is doing is actually sort of reasonable. The synthesizer is trying to avoid the O(n) cost of _var23.remove(...)
whenever it can. Element removal from bags has linear complexity, since bags are represented with arrays at code-generation time. Cozy found that it doesn't have to remove elements that are not in the bag:
for _var89 in (EHasKey(_var444, n) ? [n] : []):
_var23.remove(_var89);
To do this, Cozy needs _var444
, which tracks the set of unique values in the bag. But this creates a new problem: what happens when you remove a copy of a value from the bag? You should remove it from _var444
if it was the last copy, and keep it if there are others. Cozy introduces _var10957
to track which values have exactly one copy in the bag so that it can determine what to do to _var444
when you remove an element. Then it realizes it needs _var351979
to track how many copies there are of each element. Finally, _var372222
does actually look spurious to me; hopefully with a few hours more synthesis time Cozy would realize it doesn't need that one.
Many of the insanely long let _varABC = ...
lines in the output you included actually have very simple implementations, but it will take Cozy quite a while to discover those simple implementations.
If you add a precondition to remove
that says you promise never to try removing elements that aren't in the bag (assume n in l;
), then Cozy realizes it can't avoid the removal and will terminate almost immediately with an implementation identical to the input specification.
Dead code elimination seems to be totally broken (_var23 is the only state used by the query, all the other states and updates to them should be deleted)
See above: Cozy introduces the extra members to help implement remove
. For instance, _var444
is used directly to determine whether to skip the call to _var23.remove(_var89)
.
This project doesn't seem very active anymore. Is there any more development effort happening?
Day-to-day development has mostly stopped since I no longer work on Cozy full-time. For the past two years or so @izgzhen has been keeping the lights on and prototyping some new ideas.
I have some ambitious ideas for improvements, but right now (as you discovered) Cozy is very unpredictable and remains more "research prototype" than "useful tool". :(
Is there a better channel for communication?
I watch GitHub like a hawk, so you've found the best channel for communication.
Wow, okay, thanks for your explanation. I didn't realize Bag was so inefficient.
If I was trying to optimize this in the same way, I would immediately reach for something like Map<T, Int> where the value is the number of copies, along with a special iterator that replicates each key as many times as necessary. I'm aware that that would take more space if all the elements are unique.
So, it seems like a good solution would be to just give it a better implementation of Bag?
Agreed, that is likely the best path forward.
(A bit more context: as tool implementor, I often had to choose whether to spend my time improving Cozy's core synthesis algorithm or giving it more built-in knowledge. I have always fallen on the side of improving the synthesis algorithm rather than giving Cozy a better data structure library, and that may have been the wrong choice. It would be nice to spend a lot of time thinking deeply about Cozy's data structure library if I ever get a chance.)
I'm really excited about the concept of collection optimization and synthesis, but I'm a little confused about a few things.
First, examples/basic.ds is mentioned in the readme as a good starting point, but why? It seems pretty clear to me (I don't have a proof) that basic.ds cannot get any better. If Cozy (even working correctly) cannot make any improvements to this example, why is it the first example you point people to?
Second, Cozy seems to make it much much worse. I thought it might be a regression, but I've gone very far back into the git history, and it produces garbage like this every time:
What is going on here? Dead code elimination seems to be totally broken (_var23 is the only state used by the query, all the other states and updates to them should be deleted), but also, why are any of these things ever chosen in the first place? Not a single one of them could possibly improve on the efficiency of any of the queries.
This project doesn't seem very active anymore. Is there any more development effort happening? Is there a better channel for communication?