Add support for unit tests to Alpha's input language

This PR adds unit tests as a feature to the ASP language interpreted by Alpha.

Unit test syntax

Test cases are defined using the directive #test. Every test case needs a name and some condition on the number of expected answer sets for the tested code. Furthermore, a unit test may specify some input data (i.e. facts) as "setup code". The actual verification happens in assertion blocks. The following grammar snippet formally defines the syntax of unit tests:

// Every test case has a name (i.e. IDENTIFIER), a satisfiability vondition, some input and zero to many assertions.
directive_test : '#' 'test' IDENTIFIER '(' test_satisfiability_condition ')' '{' test_input test_assert* '}'; 

// A satsifiability condition establishes some check on the number of answer sets of the tested program.
// We can either expect a program to be unsatisfiable, or specify some check on the number of answer sets, e. g. "<= 5".
test_satisfiability_condition : 'expect' ':' ('unsat' | (COMPARISON_OPERATOR? NUMBER));

// Test input is a block declared using the "given" keyword that consists of an arbitrary number of facts.
test_input : 'given' '{' (basic_atom '.')* '}';

// An assertion may be made over all answer sets (test_assert_all) or only some answer sets (test_assert_some).
test_assert : test_assert_all | test_assert_some;

// A universal assertion must hold in all answer sets and may be specified as an arbitrary ASP program.
test_assert_all : 'assertForAll' '{' statements? '}';

// An existential assertion must hold in at least one answer set and may be specified as an arbitrary ASP program.
test_assert_some : 'assertForSome' '{' statements? '}';

The following example program shows a very simple test case using one universal assertion:

% A very simple rule.
a :- b. 

% Test case "ensure a" expects one answer.
#test ensure_a(expect: 1) { 
    % In the scope of this test, we assume a fact "b.".
    given { 
        b.
    } 
    % When solving the program with the setup in the "given" block,
    % we expect the assertion to hold on all answer sets, i.e.
    % the atom "a" must exist in all answer sets.
    assertForAll { 
        :- not a. 
    }
}

Evaluation of Unit Tests in Alpha

Tests are evaluated by Alpha according to the pseudocode algorithm below:

TP = U union TI // The program resulting from adding the facts from the "given block" to the tested program.
TA = solve(TP) // The answer sets of TP
if(not_satisfies(size(TA), CS)){ // Test fails if number of answer sets doesn't match satisfiability condition
    fail(TC)
}
for(A : A_1 ... A_n) { // Check all assertions of test case TC
    if(is_universal_assertion(A)) {
        if(all_match(TA, A)) {
            pass(TC)
        } else {
            fail(TC)
        }
    }
    if(is_existential_assertion(A)) {
        if(one_matches(TA, A)) {
            pass(TC)
        } else {
            fail(TC)
        }
    }
}

Program components, intermediate results, and functions:

Program U: The tested unit, i.e. the "regular" ASP code Alpha is called with
Test TC: One Unit-Test, consisting of Test-Input TI and assertions A_1 through A_n
Program TP: The program resulting from adding test input facts (TI) to program under test U
Set of Test Answer Sets TA: The answer sets of TP
size(ASS): returns the number of answer sets in the set ASS
Satisfiability condition CS: The condition on the number of answer sets of TP (unsat or comparison operator over psoitive integer)
not_satisfies(size, expected): true iff the integer size does not conform to the given satisfiability condition 'CS'
Program P(AS): The program resulting from transforming the atoms in an answer set to a program consisting of facts only
Test-Result Answer Set TR: The answer set of the program resulting from solving P(AS) union verifier(A), i.e. the verifier of assertion A with added facts P(AS)
assertionError(AS): checks if an answer set AS contains assertion errors, i.e. assertionError(AS) is true iff AS contains atoms of any predicate with name assertion_error of any arity
fail(T) lets the test T fail
pass(T) lets the test T pass

Intuitively, the test input TI of a test TC is added to the program under test U and solved. If the number of answer sets does not conform to the expectation of the test, the test fails. Assertions are evaluated by transforming every answer set of the program into facts and combine it with the verification program associated with the assertion. The assertion succeeds if the resulting program is satisfiable. This needs to hold for all answer sets in case of a universal assertion (assertForAll) or at least one answer set for an existential assertion (assertForSome).

Evaluation Example

In order to illustrate how unit tests are evaluated, consider the following typical encoding of the graph 3-coloring problem:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

Assume we want to verify that our "directed edge conversion rule" edge(Y, X) :- edge(X, Y). works as expected, i.e. for any edge from vertex a to b, we also get the reverse edge from b to a. This can be accomplished using following test case:

%% Verify that directed edges are converted to undirected ones.
#test no_asymmetric_edge(expect: >0) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    assertForAll {
        :- edge(A, B), not edge(B, A).
    }   
}

To understand how Alpha evaluates this test, let's first look at the given block: We specify a simple triangular graph with vertices a, b and c and directed edges (a, b), (b, c) and (c, a). The facts from this block are combined with the program itself, i.e. the actual test program with respect to which the test case is evaluated is:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

% Input graph from test case
vertex(a).
vertex(b).
vertex(c).
edge(a, b).
edge(b, c).
edge(c, a).

Solving this program in Alpha yields 6 answer sets:

Answer set 1:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(c), vertex(a), vertex(b), vertex(c) }
Answer set 2:
{ blue(a), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(b), red(c), vertex(a), vertex(b), vertex(c) }
Answer set 3:
{ blue(a), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(c), red(b), vertex(a), vertex(b), vertex(c) }
Answer set 4:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(c), red(a), vertex(a), vertex(b), vertex(c) }
Answer set 5:
{ blue(c), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(b), vertex(a), vertex(b), vertex(c) }
Answer set 6:
{ blue(c), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(b), red(a), vertex(a), vertex(b), vertex(c) }

The satisfiability condition of test case no_asymmetric_edge is expect: >0 - since 6 is greater than zero, the condition holds.

Next, the universal asssertion assertForAll { :- edge(A, B), not edge(B, A).}is verified. It uses one constraint to express that in no answer set there must be an edge without a reverse edge. Assertions are verified by converting each answer set to facts, combining it with the assertion code, and solving the resulting program. For example, in case of answer set 1 from above, this would yield:

% Facts from answer set.
blue(b).
edge(a, b).
edge(a, c).
edge(b, a).
edge(b, c).
edge(c, a). 
edge(c, b). 
green(a). 
red(c). 
vertex(a).
vertex(b).
vertex(c).  

% Assertion code
:- edge(A, B), not edge(B, A).

Solving this program yields one answer set:

Answer set 1:
{ blue(b), edge(a, b), edge(a, c), edge(b, a), edge(b, c), edge(c, a), edge(c, b), green(a), red(c), vertex(a), vertex(b), vertex(c) }

It's easy to see that similarly, the resulting programs for all six answer sets will be satisfiable, thereby satisfying the universal assertion that in no answer set there must be an edge without a reverse edge.

Existential assertions work similarly, but instead of requiring the assertion to hold in all answer sets, it just needs to hold in at least one answer set in order for the assertion to be satisfied.

Notes on design and intended use

Assertion Encoding

Since the verifier associated with an assertion can be any kind of ASP program, one could put choice rules, directives, or virtually anything else there. However, the intended (and reasonable) usage is to stick to one or a few constraints, potentially with a few simple helper rules

Scope of a test

Implementation-wise the program under test is all the input Alpha is called with (potentially multiple files along with command-line-supplied strings), however, I propose the following guidelines for "good style when writing unit tests":

Test code contained in the same file as implementation code refers to that code only and is not meant to be run together with additional code
Test code targeted at implementation code that is spread out across multiple files should be written in a dedicated test-suite file that contains only unit tests (functionally, I'd call that an integration test then).
Commandline-interface for unit tests

By default, a commandline call like java -jar Alpha.jar -i someFile.asp will not run any unit tests contained in someFile.asp. The idea is that in most real-world uses there would be more than one input file which would make it necessary to disable tests using some flag most of the time. To run unit tests on someFile.asp, the command is java -jar Alpha.jar --run-tests -i someFile.asp.

Full test suite example

Building on the graph-coloring example from above, the following version of the graph coloring program is enriched with a comprehensive suite of unit tests covering most of the implementation using both universal and existential assertions:

%%% Basic Encoding of the Graph 3-Coloring Problem %%%
% Graphs are interpreted as having undirected edges.

% Edges are undirected
edge(Y, X) :- edge(X, Y).

% Guess color for each vertex
red(V) :- vertex(V), not green(V), not blue(V).
green(V) :- vertex(V), not red(V), not blue(V).
blue(V) :- vertex(V), not red(V), not green(V).

% Check for invalid colorings
:- vertex(V1), vertex(V2), edge(V1, V2), red(V1), red(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), green(V1), green(V2).
:- vertex(V1), vertex(V2), edge(V1, V2), blue(V1), blue(V2).

%% Verify that directed edges are converted to undirected ones.
#test no_asymmetric_edge(expect: >0) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    assertForAll {
        :- edge(A, B), not edge(B, A).
    }   
}

#test triangle_colorings(expect: 6) {
    given {
        vertex(a).
        vertex(b).
        vertex(c).
        edge(a, b).
        edge(b, c).
        edge(c, a).
    }
    % Make sure all vertices are colored in all answer sets
    assertForAll {
        colored(V) :- vertex(V), red(V).
        colored(V) :- vertex(V), green(V).
        colored(V) :- vertex(V), blue(V).
        :- vertex(V), not colored(V).
    }
    % Make sure we do not have neighboring vertices of same color in any answer set
    assertForAll {
        :- edge(V1, V2), red(V1), red(V2).
        :- edge(V1, V2), green(V1), green(V2).
        :- edge(V1, V2), blue(V1), blue(V2).
    }
    % In at least one answer set, vertex a should be red
    assertForSome {
        :- not red(a).
    }
    % In at least one answer set, vertex a should be green
    assertForSome {
        :- not green(a).
    }
    % In at least one answer set, vertex a should be blue
    assertForSome {
        :- not blue(a).
    }
}

Codecov Report

Patch coverage: 71.81% and project coverage change: -0.18 :warning:

Comparison is base (b16b9d6) 79.04% compared to head (5dd8f80) 78.86%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #358 +/- ## ============================================ - Coverage 79.04% 78.86% -0.18% - Complexity 778 813 +35 ============================================ Files 201 208 +7 Lines 8671 8863 +192 Branches 1471 1495 +24 ============================================ + Hits 6854 6990 +136 - Misses 1365 1411 +46 - Partials 452 462 +10 ``` | [Impacted Files](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | Coverage Δ | | |---|---|---| | [...-app/src/main/java/at/ac/tuwien/kr/alpha/Main.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY2xpLWFwcC9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9NYWluLmphdmE=) | `26.82% <0.00%> (-5.53%)` | :arrow_down: | | [.../at/ac/tuwien/kr/alpha/api/config/InputConfig.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtYXBpL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9jb25maWcvSW5wdXRDb25maWcuamF2YQ==) | `59.15% <50.00%> (-0.55%)` | :arrow_down: | | [.../tuwien/kr/alpha/app/config/CommandLineParser.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY2xpLWFwcC9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9hcHAvY29uZmlnL0NvbW1hbmRMaW5lUGFyc2VyLmphdmE=) | `82.05% <60.00%> (-0.42%)` | :arrow_down: | | [.../tuwien/kr/alpha/commons/programs/tests/Tests.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL1Rlc3RzLmphdmE=) | `68.75% <68.75%> (ø)` | | | [.../kr/alpha/commons/programs/tests/TestCaseImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL1Rlc3RDYXNlSW1wbC5qYXZh) | `71.42% <71.42%> (ø)` | | | [.../tuwien/kr/alpha/core/parser/ParseTreeVisitor.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29yZS9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb3JlL3BhcnNlci9QYXJzZVRyZWVWaXNpdG9yLmphdmE=) | `83.65% <77.41%> (-2.00%)` | :arrow_down: | | [...ava/at/ac/tuwien/kr/alpha/api/impl/TestRunner.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtc29sdmVyL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9pbXBsL1Rlc3RSdW5uZXIuamF2YQ==) | `79.62% <79.62%> (ø)` | | | [.../tuwien/kr/alpha/api/programs/tests/Assertion.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtYXBpL3NyYy9tYWluL2phdmEvYXQvYWMvdHV3aWVuL2tyL2FscGhhL2FwaS9wcm9ncmFtcy90ZXN0cy9Bc3NlcnRpb24uamF2YQ==) | `85.71% <85.71%> (ø)` | | | [...kr/alpha/commons/programs/ASPCore2ProgramImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL0FTUENvcmUyUHJvZ3JhbUltcGwuamF2YQ==) | `87.50% <85.71%> (+27.50%)` | :arrow_up: | | [...kr/alpha/commons/programs/tests/AssertionImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL3Rlc3RzL0Fzc2VydGlvbkltcGwuamF2YQ==) | `85.71% <85.71%> (ø)` | | | ... and [9 more](https://codecov.io/gh/alpha-asp/Alpha/pull/358?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

alpha-asp / Alpha

Native support for unit tests in Alpha #358