alpha-asp / Alpha

A lazy-grounding Answer-Set Programming system
BSD 2-Clause "Simplified" License
58 stars 10 forks source link

Native support for unit tests in Alpha #358

Closed madmike200590 closed 1 year ago

madmike200590 commented 1 year ago

Add support for unit tests to Alpha's input language

This PR adds unit tests as a feature to the ASP language interpreted by Alpha.

Unit test syntax

Test cases are defined using the directive #test. Every test case needs a name and some condition on the number of expected answer sets for the tested code. Furthermore, a unit test may specify some input data (i.e. facts) as "setup code". The actual verification happens in assertion blocks. The following grammar snippet formally defines the syntax of unit tests:

// Every test case has a name (i.e. IDENTIFIER), a satisfiability vondition, some input and zero to many assertions.
directive_test : '#' 'test' IDENTIFIER '(' test_satisfiability_condition ')' '{' test_input test_assert* '}'; 

// A satsifiability condition establishes some check on the number of answer sets of the tested program.
// We can either expect a program to be unsatisfiable, or specify some check on the number of answer sets, e. g. "<= 5".
test_satisfiability_condition : 'expect' ':' ('unsat' | (COMPARISON_OPERATOR? NUMBER));

// Test input is a block declared using the "given" keyword that consists of an arbitrary number of facts.
test_input : 'given' '{' (basic_atom '.')* '}';

// An assertion may be made over all answer sets (test_assert_all) or only some answer sets (test_assert_some).
test_assert : test_assert_all | test_assert_some;

// A universal assertion must hold in all answer sets and may be specified as an arbitrary ASP program.
test_assert_all : 'assertForAll' '{' statements? '}';

// An existential assertion must hold in at least one answer set and may be specified as an arbitrary ASP program.
test_assert_some : 'assertForSome' '{' statements? '}';

The following example program shows a very simple test case using one universal assertion:

% A very simple rule.
a :- b. 

% Test case "ensure a" expects one answer.
#test ensure_a(expect: 1) { 
    % In the scope of this test, we assume a fact "b.".
    given { 
        b.
    } 
    % When solving the program with the setup in the "given" block,
    % we expect the assertion to hold on all answer sets, i.e.
    % the atom "a" must exist in all answer sets.
    assertForAll { 
        :- not a. 
    }
}

Evaluation of Unit Tests in Alpha

Tests are evaluated by Alpha according to the pseudocode algorithm below:

TP = U union TI // The program resulting from adding the facts from the "given block" to the tested program.
TA = solve(TP) // The answer sets of TP
if(not_satisfies(size(TA), CS)){ // Test fails if number of answer sets doesn't match satisfiability condition
    fail(TC)
}
for(A : A_1 ... A_n) { // Check all assertions of test case TC
    if(is_universal_assertion(A)) {
        if(all_match(TA, A)) {
            pass(TC)
        } else {
            fail(TC)
        }
    }
    if(is_existential_assertion(A)) {
        if(one_matches(TA, A)) {
            pass(TC)
        } else {
            fail(TC)
        }
    }
}

Program components, intermediate results, and functions:

Intuitively, the test input TI of a test TC is added to the program under test U and solved. If the number of answer sets does not conform to the expectation of the test, the test fails. Assertions are evaluated by transforming every answer set of the program into facts and combine it with the verification program associated with the assertion. The assertion succeeds if the resulting program is satisfiable. This needs to hold for all answer sets in case of a universal assertion (assertForAll) or at least one answer set for an existential assertion (assertForSome).

Evaluation Example

In order to illustrate how unit tests are evaluated, consider the following typical encoding of the graph 3-coloring problem:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

Assume we want to verify that our "directed edge conversion rule" edge(Y, X) :- edge(X, Y). works as expected, i.e. for any edge from vertex a to b, we also get the reverse edge from b to a. This can be accomplished using following test case:

%% Verify that directed edges are converted to undirected ones.
#test no_asymmetric_edge(expect: >0) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    assertForAll {
        :- edge(A, B), not edge(B, A).
    }   
}

To understand how Alpha evaluates this test, let's first look at the given block: We specify a simple triangular graph with vertices a, b and c and directed edges (a, b), (b, c) and (c, a). The facts from this block are combined with the program itself, i.e. the actual test program with respect to which the test case is evaluated is:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

% Input graph from test case
vertex(a).
vertex(b).
vertex(c).
edge(a, b).
edge(b, c).
edge(c, a).

Solving this program in Alpha yields 6 answer sets:

Answer set 1:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(c), vertex(a), vertex(b), vertex(c) }
Answer set 2:
{ blue(a), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(b), red(c), vertex(a), vertex(b), vertex(c) }
Answer set 3:
{ blue(a), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(c), red(b), vertex(a), vertex(b), vertex(c) }
Answer set 4:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(c), red(a), vertex(a), vertex(b), vertex(c) }
Answer set 5:
{ blue(c), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(b), vertex(a), vertex(b), vertex(c) }
Answer set 6:
{ blue(c), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(b), red(a), vertex(a), vertex(b), vertex(c) }

The satisfiability condition of test case no_asymmetric_edge is expect: >0 - since 6 is greater than zero, the condition holds.

Next, the universal asssertion assertForAll { :- edge(A, B), not edge(B, A).}is verified. It uses one constraint to express that in no answer set there must be an edge without a reverse edge. Assertions are verified by converting each answer set to facts, combining it with the assertion code, and solving the resulting program. For example, in case of answer set 1 from above, this would yield:

% Facts from answer set.
blue(b).
edge(a, b).
edge(a, c).
edge(b, a).
edge(b, c).
edge(c, a). 
edge(c, b). 
green(a). 
red(c). 
vertex(a).
vertex(b).
vertex(c).  

% Assertion code
:- edge(A, B), not edge(B, A).

Solving this program yields one answer set:

Answer set 1:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(c), vertex(a), vertex(b), vertex(c) }

It's easy to see that similarly, the resulting programs for all six answer sets will be satisfiable, thereby satisfying the universal assertion that in no answer set there must be an edge without a reverse edge.

Existential assertions work similarly, but instead of requiring the assertion to hold in all answer sets, it just needs to hold in at least one answer set in order for the assertion to be satisfied.

Notes on design and intended use

Assertion Encoding

Since the verifier associated with an assertion can be any kind of ASP program, one could put choice rules, directives, or virtually anything else there. However, the intended (and reasonable) usage is to stick to one or a few constraints, potentially with a few simple helper rules

Scope of a test

Implementation-wise the program under test is all the input Alpha is called with (potentially multiple files along with command-line-supplied strings), however, I propose the following guidelines for "good style when writing unit tests":

Full test suite example

Building on the graph-coloring example from above, the following version of the graph coloring program is enriched with a comprehensive suite of unit tests covering most of the implementation using both universal and existential assertions:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

%% Verify that directed edges are converted to undirected ones.
#test no_asymmetric_edge(expect: >0) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    assertForAll {
        :- edge(A, B), not edge(B, A).
    }   
}

#test triangle_colorings(expect: 6) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    % Make sure all vertices are colored in all answer sets
    assertForAll {
        colored(V) :- vertex(V), red(V).
        colored(V) :- vertex(V), green(V).
        colored(V) :- vertex(V), blue(V).
        :- vertex(V), not colored(V).
    }
    % Make sure we do not have neighboring vertices of same color in any answer set
    assertForAll {
        :- edge(V1, V2), red(V1), red(V2).
        :- edge(V1, V2), green(V1), green(V2).
        :- edge(V1, V2), blue(V1), blue(V2).
    }
    % In at least one answer set, vertex a should be red
    assertForSome {
        :- not red(a).
    }
    % In at least one answer set, vertex a should be green
    assertForSome {
        :- not green(a).
    }
    % In at least one answer set, vertex a should be blue
    assertForSome {
        :- not blue(a).
    }
}
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 71.81% and project coverage change: -0.18 :warning:

Comparison is base (b16b9d6) 79.04% compared to head (5dd8f80) 78.86%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #358 +/- ## ============================================ - Coverage 79.04% 78.86% -0.18% - Complexity 778 813 +35 ============================================ Files 201 208 +7 Lines 8671 8863 +192 Branches 1471 1495 +24 ============================================ + Hits 6854 6990 +136 - Misses 1365 1411 +46 - Partials 452 462 +10 ``` | [Impacted Files](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | Coverage Δ | | |---|---|---| | [...-app/src/main/java/at/ac/tuwien/kr/alpha/Main.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY2xpLWFwcC9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9NYWluLmphdmE=) | `26.82% <0.00%> (-5.53%)` | :arrow_down: | | [.../at/ac/tuwien/kr/alpha/api/config/InputConfig.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtYXBpL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9jb25maWcvSW5wdXRDb25maWcuamF2YQ==) | `59.15% <50.00%> (-0.55%)` | :arrow_down: | | [.../tuwien/kr/alpha/app/config/CommandLineParser.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY2xpLWFwcC9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9hcHAvY29uZmlnL0NvbW1hbmRMaW5lUGFyc2VyLmphdmE=) | `82.05% <60.00%> (-0.42%)` | :arrow_down: | | [.../tuwien/kr/alpha/commons/programs/tests/Tests.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL1Rlc3RzLmphdmE=) | `68.75% <68.75%> (ø)` | | | [.../kr/alpha/commons/programs/tests/TestCaseImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL1Rlc3RDYXNlSW1wbC5qYXZh) | `71.42% <71.42%> (ø)` | | | [.../tuwien/kr/alpha/core/parser/ParseTreeVisitor.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29yZS9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb3JlL3BhcnNlci9QYXJzZVRyZWVWaXNpdG9yLmphdmE=) | `83.65% <77.41%> (-2.00%)` | :arrow_down: | | [...ava/at/ac/tuwien/kr/alpha/api/impl/TestRunner.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtc29sdmVyL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9pbXBsL1Rlc3RSdW5uZXIuamF2YQ==) | `79.62% <79.62%> (ø)` | | | [.../tuwien/kr/alpha/api/programs/tests/Assertion.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtYXBpL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9wcm9ncmFtcy90ZXN0cy9Bc3NlcnRpb24uamF2YQ==) | `85.71% <85.71%> (ø)` | | | [...kr/alpha/commons/programs/ASPCore2ProgramImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL0FTUENvcmUyUHJvZ3JhbUltcGwuamF2YQ==) | `87.50% <85.71%> (+27.50%)` | :arrow_up: | | [...kr/alpha/commons/programs/tests/AssertionImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL0Fzc2VydGlvbkltcGwuamF2YQ==) | `85.71% <85.71%> (ø)` | | | ... and [9 more](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

madmike200590 commented 1 year ago

@AntoniusW this is now - finally - ready to be reviewed.

madmike200590 commented 1 year ago

As this has been open for two months now, and all changes to the initial design discussed on #243 have been realized, I am now merging this as-is. @AntoniusW if you'd like any further changes/improvements to this feature, let me know and I'll do a follow-up PR.