added linear-growing cardinality constraints: AtLeastK, AtMostK, ExactlyK

This pull request is a prototype-implementation of those cardinality-constraints mentioned in issue #127.

State

It won't work in this state! It fails when using ExactlyK with k=1 as OneHot alternative in the sudoku example. The reason is unclear to me, but i think the newly created variables are not globally introduced. The following code demonstrates the problem (and was only observed after putting this code into pyeda; see next paragraph):

from pyeda.boolalg.expr import And, OneHot, ExactlyN, AtMostN, AtLeastN, expr2dimacscnf
from pyeda.boolalg.bfarray import exprvars

x = exprvars('x', 2)
r_0 = OneHot(*[x[i] for i in range(2)])
r_1 = ExactlyN(*[x[i] for i in range(2)]) # UNSAT which is wrong; uses AtMostN&AtLeastN internally!
r_2 = AtMostN(*[x[i] for i in range(2)])
r_3 = AtLeastN(*[x[i] for i in range(2)])
print(r_0)
print(r_1)
print(r_2)
print(r_3)
print(r_0.satisfy_one())
print(r_1.satisfy_one())
print(r_2.satisfy_one())
print(r_3.satisfy_one())

Interesting observation / Working case

While implementing and before adding to the pyeda library, i tested with the following working code (can be tested without the commit; code looks different because of some cleaning-steps, but it is the same; works on lists of variables), which let's me think, that the logic of these algorithms are correct:

from pyeda.inter import *
from pyeda.boolalg import exprnode
from pyeda.boolalg.expr import _expr
import itertools
import random
random.seed(1)

def AtMostN(*xs, k=1, auxvarname='less_or_equal'):
    """
    Return an expression that means
    "at most k input functions are true".
    Returns cnf

    k >= 1
    len(xs) > k
    """
    xs = [Expression.box(x).node for x in xs]
    n = len(xs)
    terms = list()
    aux_vars = list(map(lambda x: x.node, exprvars(auxvarname, (n-1)*k)))

    def s(i,j,k):
        return aux_vars[i*k+j]

    # case: i=1
    terms.append(exprnode.or_(exprnode.not_(xs[0]), s(0,0,k)))
    for i in range(1,k):
        terms.append(exprnode.not_(s(0,i,k)))

    # general case
    for i in range(1,n-1):
        terms.append(exprnode.or_(exprnode.not_(xs[i]),s(i,0,k)))
        terms.append(exprnode.or_(exprnode.not_(s(i-1,0,k)), s(i,0,k)))
        for u in range(1,k):
            terms.append(exprnode.or_(exprnode.not_(xs[i]), exprnode.not_(s(i-1,u-1,k)), s(i,u,k)))
            terms.append(exprnode.or_(exprnode.not_(s(i-1,u,k)), s(i,u,k)))
        terms.append(exprnode.or_(exprnode.not_(xs[i]), exprnode.not_(s(i-1,k-1,k))))

    terms.append(exprnode.or_(exprnode.not_(xs[-1]), exprnode.not_(s(n-2,k-1,k))))

    y = exprnode.and_(*terms)
    return _expr(y)

# def AtLeastN_naive(*xs, k=1):
#     xs = [Expression.box(x).node for x in xs]
#     n = len(xs)
#     return _expr(exprnode.or_(*[exprnode.and_(*i) for i in itertools.combinations(xs,k)]))

def AtLeastN(*xs, k=1, auxvarname='greater_or_equal'):
    """
    Return an expression that means
    "at least k input functions are true".
    Returns cnf
    """
    xs = [_expr(exprnode.not_(Expression.box(x).node)) for x in xs]   # negated
    n = len(xs)
    return AtMostN(*xs, k=n-k, auxvarname=auxvarname)

def ExactlyN(*xs, k=1, auxvarname='equal'):
    """
    """
    xs = [_expr(Expression.box(x).node) for x in xs]
    return And(AtMostN(*xs, k=k), AtLeastN(*xs,k=k))

def test_atmostn(runs=10000):
    for r in range(runs):
        print(r)
        n_vars = random.randint(2,100)
        value = random.randint(1,n_vars-1)

        print(('Instance', n_vars, value))

        x = exprvars('x', n_vars)

        MAX = AtMostN(*[x[i] for i in range(n_vars)], k=value, auxvarname='less_or_equal')
        MIN = AtLeastN(*[x[i] for i in range(n_vars)], k=value, auxvarname='greater_or_equal')

        BOTH = And(MAX,MIN)
        BOTH_NEW = ExactlyN(*[x[i] for i in range(n_vars)], k=value, auxvarname='equal')
        result = MIN.satisfy_one()
        assert result
        result = MAX.satisfy_one()
        assert result
        result = BOTH.satisfy_one()
        assert result
        result = BOTH_NEW.satisfy_one()
        assert result

        count_trues = 0
        for i in x:
            if result[i] == 1:
                count_trues += 1
        print(n_vars, value, count_trues)
        assert value == count_trues

test_atmostn()

Other remarks

To think about

what about the names?
- kind of conflict with OneHot / OneHot0
cnf/dnf behaviour
- cnf is the basic design => is dnf-version needed?
allow case with only one input function?
- forbidden by assertions at the moment
  Things to check
is simplification working/needed?
unnecessary operations?
two assertions in AtMostN() function
- are assertions allowed (design-decision)?
i needed to import exprvars
- move from local to global import?

Maybe someone may take a look at my code and debug the problems observed.

I added your branch to cjdrake/pyeda. Can't write to your branch, so I'm making some edits to mine.

Made some superficial changes just now.

Will you please write a couple unit tests for these functions? Just look in the pyeda/boolalg/test/test_expr.py module for examples.

Reading a few tests will help me understand the functions a bit better.

Good news: i found the reason for the problem described above. (not yet fixed in code!)

Like assumed, the introduction of the auxiliary was the reason! Because i wanted my function-arguments to mimick the ones of OneHot and co. i introduced the auxvarname-argument. After debugging the newly introduced unit-tests & rereading part of the docs, it seems that a variable-object is completely defined by it's name at the time of creation! This is the source of the problems described above!

Easy case:

ExactlyN() uses AtLeastN() & AtMostN() internally, but called both with the same auxvarname! -> BUG!
fixed in code

Not that easy case:

ExactlyN()/AtLeastN()/AtMostN() might be called multiple times during the generation of some complex expression
The auxvarnames introduced musn't collide! -> If they do -> BUG!
NOT fixed in code
Question: How to handle this? Is there already some kind of functionality in the lib?
- I suppose the library should handle this automatically (some auxvarname-prefix/suffix), but the following shows a customized workaround for the sudoku example which works:
- idea: use a "global" generator introducing a suffix for each auxvarname

""" The following is the pyeda sudoku example using ExactlyN() instead of OneHot()
    This is only a proof-of-concept: it will be much slower than using naive OneHot.
    Reasons:
    - Domain-range small enough to use the naive approach (exponential-growth)
    - Instead of using Or(*xs) to formulate >= 1, this code uses <= k-1 of the negated values (AtLeastN())
"""

from pyeda.boolalg.expr import And, OneHot, NHot, ExactlyN, AtMostN, AtLeastN, expr2dimacscnf
from pyeda.boolalg.bfarray import exprvars
import itertools

gen = itertools.count()

DIGITS = "123456789"

X = exprvars('x', (1, 10), (1, 10), (1, 10))

V = And(*[
        And(*[
            ExactlyN(*[X[r,c,v] for v in range(1, 10)], k=1, auxvarname='AUX_'+str(next(gen)))
                     for c in range(1, 10)]
        )
        for r in range(1, 10)]
    )

R = And(*[
        And(*[
            ExactlyN(*[X[r,c,v] for c in range(1, 10)], k=1, auxvarname='AUX_'+str(next(gen)))
                     for v in range(1, 10)]
        )
        for r in range(1, 10)]
    )

C = And(*[
        And(*[
            ExactlyN(*[X[r,c,v] for r in range(1, 10)], k=1, auxvarname='AUX_'+str(next(gen)))
                     for v in range(1, 10)]
        )
        for c in range(1, 10)]
    )

B = And(*[
        And(*[
            ExactlyN(*[
                X[3*br+r,3*bc+c,v]
                    for r in range(1, 4)
                    for c in range(1, 4)], k=1, auxvarname='AUX_'+str(next(gen))
            )
            for v in range(1, 10)]
        )
        for br in range(3) for bc in range(3)]
    )

S = And(V, R, C, B)

# This step does absorption
S = S.to_cnf()

def parse_grid(grid):
    chars = [c for c in grid if c in DIGITS or c in "0."]
    assert len(chars) == 9 ** 2
    return And(*[ X[i // 9 + 1, i % 9 + 1, int(c)]
                  for i, c in enumerate(chars) if c in DIGITS ])

grid = ( ".73|...|8.."
         "..4|13.|.5."
         ".85|..6|31."
         "---+---+---"
         "5..|.9.|.3."
         "..8|.1.|5.."
         ".1.|.6.|..7"
         "---+---+---"
         ".51|6..|28."
         ".4.|.52|9.."
         "..2|...|64." )

def get_val(point, r, c):
    for v in range(1, 10):
        if point[X[r,c,v]]:
            return DIGITS[v-1]
    return "X"

def display(point):
    chars = list()
    for r in range(1, 10):
        for c in range(1, 10):
            if c in (4, 7):
                chars.append("|")
            chars.append(get_val(point, r, c))
        if r != 9:
            chars.append("\n")
            if r in (3, 6):
                chars.append("---+---+---\n")
    print("".join(chars))

def solve(grid):
    with parse_grid(grid):
        return S.satisfy_one()

bla = solve(grid)
display(bla)

A couple things.

Glad you found the problem. Yes, I designed the variable constructors to always return the same instance given the same name and index. I just thought it would end up being confusing otherwise, and didn't want to try and overload the == operator b/c that would cause all kinds of chaos.

That said, you might not actually need to use the exprvars function. The exprvar function can create variables with multiple levels of names, and multiple indices.

For example:

>>> foo = exprvar(('a', 'b', 'c'), (2, 3, 4))
>>> foo
c.b.a[2,3,4]
>>> bar = exprvar(('a', 'b', 'c'), (2, 3, 4))
>>> foo is bar
True

Also, I think the current function naming convention is confusing. Instead of AtMostN, perhaps we should call it AtMostK instead? Given the math formulas you cited, I think it make a lot of sense to stick with the convention that N represents the number of input arguments, and k is some number between 1 and N.

I have to admit, functions that produce new Literal objects has always been a bit of a challenge.

The way the Tseitin transformation works is to just start with index 0 with an empty constraint list. Then just keep incrementing the index as you continue to append new constraints. Fortunately, it's not a nestable operation, so you don't have to worry about the problem you described.

Rather than using a global variable, I think you might want to try the approach of adding new name prefixes. Like at the top level start with auxvarname = 'a', then at nesting level 1 using b.a, then c.b.a, and so forth. When you actually look at the expression, it will be nice to tell the variables apart at different nesting levels.

cjdrake / pyeda