Hi CIaran,
I spent quite a bit of time looking at this today. I did not find a bug in the
unflattening algorithm, although I am still a bit lost as to why the formulas
grow so much. By best guess is that when a formula F has a few disjunctions
inside it, it gets duplicated for each disjunction into something like F1 or
F2, which later generates F1 and F2. This last conjunction is a binary
conjunction over whatever other conjunctions we had already, so unless there is
flattening, things get duplicated but not simplified.
The example you give has two disjunctions, one over 3 disjuncts and another
over 2. The DNF of that has 6 disjuncts, but then the disjunction calculation
creates 2^6 sub-problems to be solved. That means 64 problems, which doesn't
sound enough to blow the stack. This is something I still don't understand.
Maybe it has to do with each of those 64 problems duplicating the expression.
Isn't there a neat story about someone placing a grain of corn in the first
square of a chess board and duplicating the grains for each square? I guess
that's the story here. But I still don't see clearly that these long
expressions would blow the stack. Seems like it would still go only as deep as
64 levels.
It would make sense that pick_cheapest would pick conjuncts (in
R_card1_conjunction) that are conjuncts themselves, since this does not
duplicate anything and creates opportunity for simplification. However our
current version of pick_cheapest does not take that kind of thing into account.
It often brings disjunctions up and distributes the expression, thus
duplicating. What we need is a smarter way of predicting how expensive things
might be and solving sub-problems accordingly. This should also cover tricks
such as counting the solutions of the negation of X1 = a1 or ... or Xn = an,
which creates a much cheaper problems. This is what I will be working on from
now on.
Just some rambling . Thanks.
Rodrigo
On 6/30/2012 12:46 AM, Ciaran O'Reilly wrote:
> Hi Rodrigo,
>
> I've managed to zone in on a specific example and location in the code that
highlights the problem that occurs in the algorithm if you don't flatten on
creation of expressions. Specifically the problem is with not flattening
conjuncts in this instance. The example is based around a call to
R_sum_over_one_variable with the following:
>
> sum({ ( on X2 ) 100 |
> and(and(not (X2 = a2 and X1 = a1 and Y1 = b1), not (X2 = a2)),
> not (X1 = a1 and Y1 = b1)) })
>
> and the problem occurs at this point in the R_card1_disjunction logic:
>
> N3 <- R_card1( | R_top_simplify_conjunction(F1 and F2) |_x, \"none\" )
>
> when we perform 'F1 and F2' and don't take the opportunity to flatten the new
conjunction if f1 or f2 are themselves conjunctions the call to
> R_top_simplify_conjunction fails to simplify the new conjunction leaving a
more complex expression to be dealt with by R_card1. While legal, the
processing of these more complex expressions ends up causing a Stack Overflow
exception to occur based on the algorithms current logic.
>
> The attached file 'FlattenNoFlattenComparisonTraces.txt' shows what
> happens in the case where flattening occurs (very few calls to
R_card1_disjunction) and when it does not (many calls before a stack overflow
occurs).
>
> Basically, I think its necessary to flatten expressions before calling a top
simplify routine in order to ensure these routines get an opportunity to apply
their logic. Alternatively, we could update the top simplify routines so that
they flatten out their top level arguments if they are of the same type as the
containing expression.
>
> Let me know what you think (Monday is fine as I plan to chill this weekend
--- have a good weekend too! ).
>
> Best
>
> Ciaran
Original issue reported on code.google.com by ctjoreilly@gmail.com on 11 Apr 2013 at 12:15
Original issue reported on code.google.com by
ctjoreilly@gmail.com
on 11 Apr 2013 at 12:15