Provide Explanation for stack overflow when not flattening conjunctions on construction

Hi CIaran,

I spent quite a bit of time looking at this today. I did not find a bug in the 
unflattening algorithm, although I am still a bit lost as to why the formulas 
grow so much. By best guess is that when a formula F has a few disjunctions 
inside it, it gets duplicated for each disjunction into something like F1 or 
F2, which later generates F1 and F2. This last conjunction is a binary 
conjunction over whatever other conjunctions we had already, so unless there is 
flattening, things get duplicated but not simplified.

The example you give has two disjunctions, one over 3 disjuncts and another 
over 2. The DNF of that has 6 disjuncts, but then the disjunction calculation 
creates 2^6 sub-problems to be solved. That means 64 problems, which doesn't 
sound enough to blow the stack. This is something I still don't understand. 
Maybe it has to do with each of those 64 problems duplicating the expression. 
Isn't there a neat story about someone placing a grain of corn in the first 
square of a chess board and duplicating the grains for each square? I guess 
that's the story here. But I still don't see clearly that these long 
expressions would blow the stack. Seems like it would still go only as deep as 
64 levels.

It would make sense that pick_cheapest would pick conjuncts (in 
R_card1_conjunction) that are conjuncts themselves, since this does not 
duplicate anything and creates opportunity for simplification. However our 
current version of pick_cheapest does not take that kind of thing into account. 
It often brings disjunctions up and distributes the expression, thus 
duplicating. What we need is a smarter way of predicting how expensive things 
might be and solving sub-problems accordingly. This should also cover tricks 
such as counting the solutions of the negation of X1 = a1 or ... or Xn = an, 
which creates a much cheaper problems. This is what I will be working on from 
now on.

Just some rambling . Thanks.

Rodrigo

On 6/30/2012 12:46 AM, Ciaran O'Reilly wrote:
> Hi Rodrigo,
>
> I've managed to zone in on a specific example and location in the code that 
highlights the problem that occurs in the algorithm if you don't flatten on 
creation of expressions. Specifically the problem is with not flattening 
conjuncts in this instance. The example is based around a call to 
R_sum_over_one_variable with the following:
>
> sum({ ( on X2 ) 100 |
>   and(and(not (X2 = a2 and X1 = a1 and Y1 = b1), not (X2 = a2)),
>       not (X1 = a1 and Y1 = b1)) })
>
> and the problem occurs at this point in the R_card1_disjunction logic:
>
> N3 <- R_card1( | R_top_simplify_conjunction(F1 and F2) |_x, \"none\" )
>
> when we perform 'F1 and F2' and don't take the opportunity to flatten the new 
conjunction if f1 or f2 are themselves conjunctions the call to
> R_top_simplify_conjunction fails to simplify the new conjunction leaving a 
more complex expression to be dealt with by R_card1. While legal, the 
processing of these more complex expressions ends up causing a Stack Overflow 
exception to occur based on the algorithms current logic.
>
> The attached file 'FlattenNoFlattenComparisonTraces.txt' shows what
> happens in the case where flattening occurs (very few calls to 
R_card1_disjunction) and when it does not (many calls before a stack overflow 
occurs).
>
> Basically, I think its necessary to flatten expressions before calling a top 
simplify routine in order to ensure these routines get an opportunity to apply 
their logic. Alternatively, we could update the top simplify routines so that 
they flatten out their top level arguments if they are of the same type as the 
containing expression.
>
> Let me know what you think (Monday is fine as I plan to chill this weekend 
--- have a good weekend too! ).
>
> Best
>
> Ciaran
Original issue reported on code.google.com by ctjoreilly@gmail.com on 11 Apr 2013 at 12:15
gaoyuanning / aic-expresso

Provide Explanation for stack overflow when not flattening conjunctions on construction #8