counting solutions within a model is awkward and slow

dan1221 commented 2 years ago

This is split off of issue #607.

Suppose I have a model that counts involutions:

% Version 1: count involutions, a permutation that equals its inverse; OEIS:A000085
include "inverse_fn.mzn"; 

int: n=10;
set of int: N = 1..n;
array[N] of var N: a;
constraint alldifferent(a);   
constraint a = inverse(a);
solve satisfy;

Suppose I write the code differently, to count the number of involutions within the model instead of using the solver statistics:

% Version 2: count involutions, a permutation that equals its inverse; OEIS:A000085
include "inverse_fn.mzn"; 
include "arg_sort.mzn";

int: n;
set of int: N = 1..n;

predicate equals_inverse(array[N] of var N: b) =  % involution 
  alldifferent(b) /\ (b = arg_sort(b));  % had to switch to arg_sort instead of inverse #609

var int: total =   % this is both awkward and inefficient
if n==2 then
   sum(i,j in 1..n)(equals_inverse([i,j]))  
elseif n==3 then
   sum(i,j,k in 1..n)(equals_inverse([i,j,k]))         
elseif n==4 then
   sum(i,j,k,l in 1..n)(equals_inverse([i,j,k,l]))
elseif n==5 then
   sum(i,j,k,l,m in 1..n)(equals_inverse([i,j,k,l,m]))
elseif n==6 then
   sum(i,j,k,l,m,q in 1..n)(equals_inverse([i,j,k,l,m,q]))
elseif n==7 then
   sum(i,j,k,l,m,q,r in 1..n)(equals_inverse([i,j,k,l,m,q,r]))
else 0
endif;

solve satisfy;

output ["number of solutions is ", show(total)];

Version 2 works, but obviously the if/then/else is awkward, and it's much slower. Dekker1's comment in #607 says:

Your initial solution would be the preferred MiniZinc way is to use the solution count as produced in the statistics.

OK, but I'd like to understand why this is so slow and awkward, and if it could be improved somehow. Of course, I'm trying to do exactly the same thing in Version 2 as Version 1, and so I'm hoping there could be some new syntax to express that. I'd like to say something like: sum(forall array[N] of var N: x)(equals_inverse(x))

My use case has to do with finding a permutation of size N that maximizes the number of true predicates. So I'm using "sum" to count the true predicates.

zayenz commented 2 years ago

In general, a MiniZinc model can be viewed as encoding an NP problem: is there an assignment to the specified variables that satisfy the constraints. Since the constraints are (generally) assumed to be checkable in polynomial time and assuming a standard reading of problem size, the problem is in NP.

The type of question that you are trying to answer is a counting problem, and counting problems for NP corresponds to the complexity class #P (complexity zoo). There is no strict separation, but it is reasonable to assume that #P is harder than NP, and that naturally leads to a blow-up in the size of an encoding when shoe-horned into a system geared to NP. For any particular problem it might be doable, but probably not in general.

As for the proposed construction sum(forall array[N] of var N: x)(equals_inverse(x)), this is a universal quantification. Allowing universal quantification is indeed interesting, but it is a harder problem that requires specialized solvers. There have been some attempts at making quantified CSP systems, but none that have become popular as far as I know. For example, Gecode can not solve quantified problems, but there has been tools built on top of Gecode that could: Quacode and QeCode (although I'm not sure if either can be compiled anymore).

As for your particular problem, I have not considered it in depth. I just wanted to leave some notes on the general landscape.

dan1221 commented 2 years ago

Thanks for the notes. I realize that this is a hard problem, but Chuffed handles Version 1 much better than Version 2, so I'd like to find some way to tell MiniZinc that Version 2 is exactly the same problem as Version 1, aside from how it counts solutions.

zayenz commented 2 years ago

Looking a bit more into your second solution, you are not actually using any constraint programming in this solution, you are programming finding the solution using MiniZinc as a programming language. As such, it never actually reaches Gecode or Chuffed, which can be seen in the statistics or by compiling the model.

Consider the part sum(i,j,k in 1..n)(equals_inverse([i,j,k])) when n is 3. This will expand to

sum([
    equals_inverse([1, 1, 1]),
    equals_inverse([1, 1, 2]),
    equals_inverse([1, 1, 3]),
    equals_inverse([1, 2, 1]),
    equals_inverse([1, 2, 2]),
    equals_inverse([1, 2, 3]),
    equals_inverse([1, 3, 1]),
    equals_inverse([1, 3, 2]),
    equals_inverse([1, 3, 3]),
    equals_inverse([2, 1, 1]),
    equals_inverse([2, 1, 2]),
    equals_inverse([2, 1, 3]),
    equals_inverse([2, 2, 1]),
    equals_inverse([2, 2, 2]),
    equals_inverse([2, 2, 3]),
    equals_inverse([2, 3, 1]),
    equals_inverse([2, 3, 2]),
    equals_inverse([2, 3, 3]),
    equals_inverse([3, 1, 1]),
    equals_inverse([3, 1, 2]),
    equals_inverse([3, 1, 3]),
    equals_inverse([3, 2, 1]),
    equals_inverse([3, 2, 2]),
    equals_inverse([3, 2, 3]),
    equals_inverse([3, 3, 1]),
    equals_inverse([3, 3, 2]),
    equals_inverse([3, 3, 3])
])

where the predicate calls are arrays of fixed values, not arrays of variables.

dan1221 commented 2 years ago

Interesting!

MiniZinc / libminizinc

counting solutions within a model is awkward and slow #615