madmike200590 commented 2 years ago

The intention of this PR is to provide functionality to reify (i.e. encode as ASP facts) input programs. These reified programs can be written to standard out for further use, or used by Alpha itself for static analysis of input programs.

Program reification

Reified programs are written to stdout using CLI option -reify.

Reification of ASP language constructs

Programs

A program consisting of facts F_1, ... F_n and rules R_1, ... , R_k will be represented using atoms of form fact(ID) and rule(ID), respectively. ID is a unique identifier for the (fact-)atom or rule in question.

Rule Heads

Rule heads are represented using the predicate rule_head/2, i.e. for each rule head, an atom of form rule_head(R_ID, H_ID) exists in the reified program, where R_ID is a unique id for the rule and H_ID a unique id for the head. An additional atom representing the type of head (normal or choice) is of form head_type(H_ID, TYPE) where type is either normal or choice.

Normal Heads wrap an atom, which is referenced using a fact of form normalHead_atom(H_ID, A_ID), with A_ID being a unique identifier for the head atom.

Choice Heads consist of lower and upper bound terms represented as choiceHead_lowerBound(H_ID, LBT) and choiceHead_upperBound(H_ID, UBT), respectively. A choice head consists of one or more choice elements which are referenced using choiceHead_element(H_ID, E_ID) and choiceHead_numElements(H_ID, NUM).

Each choice element consists of one atom and an arbitrary number of condition literals and is represented using choiceElement_atom(E_ID, A_ID), choiceElement_conditionLiteral(E_ID, L_ID) and choiceElement_numConditionLiterals(E_ID, NUM).

Constraints

For constraints, instead of a rule/1 instance, an atom of form constraint(ID) is generated. In every other respect, a constraint is just a rule without a head, i.e. there is no difference in reification of body literals.

Rule Bodies

Every body literal of a rule referenced by id R_ID is represented as rule_bodyLiteral(R_ID, L_ID). The number of body literals is encoded using rule_numBodyLiterals(R_ID, NUM).

Literals

Every literal is uniquely identified using an identifier L_ID and encoded using facts literal_polarity(L_ID, P), where P is either pos or neg, and literal_atom(L_ID, A_ID) where A_ID references the atom wrapped by the literal.

Atoms

For every atom, there is a fact atom_type(A_ID, TYPE), where TYPE is the type of the atom in question: basic, comparison, external or aggregate. Further facts depend on the type of atom that is reified.

Basic Atoms are encoded using basicAtom_predicate(A_ID, P_ID) to represent the predicate of the atom and basicAtom_numTerms(A_ID, NUM) as well as basicAtom_term(A_ID, IDX, T_ID) for the terms of the atom. IDX refers to the position in the argument list of the term uniquely identified by T_ID.

Comparison Atoms are reified to

comparisonAtom_leftTerm(A_ID, LT_ID).
comparisonAtom_rightTerm(A_ID, RT_ID).
comparisonAtom_operator(A_ID, OP)

where a comparison operator OP is one of eq, ne, le, lt, ge, gt.

The reified representation of an external atom is

externalAtom_name(A_ID, NAME).
externalAtom_numInputTerms(A_ID, NUM).
externalAtom_numOutputTerms(A_ID, NUM).
externalAtom_inputTerm(A_ID, T_ID). % Once for each term
externalAtom_outputTerm(A_ID, T_ID). % Once for each term

Aggregate Atoms are represented using facts aggregateAtom_leftHandTerm(A_ID, T_ID), aggregateAtom_rightHandTerm(A_ID, T_ID), aggregateAtom_leftHandOperator(A_ID, OP) and aggregateAtom_rightHandOperator(A_ID, OP) for left- and right-hand terms and operators. The aggregate function for a specific literal is given by aggregateAtom(A_ID, FUNC), with func being count, sum, min or max. Each aggregate element is encoded as aggregateAtom_aggregateElement(A_ID, E_ID) and has a unique identifier E_ID. For each aggregate element, the numbers of terms and literals are given as aggregateElement_numTerms(E_ID, NUM) and aggregateElement_numLiterals(E_ID, NUM). Each element term and its position are referenced as aggregateElement_term(E_ID, IDX, T_ID) while literals are encoded as aggregateElement_literal(E_ID, L_ID).

Terms

Every reified term has a fact term_type(T_ID, TYPE) specifying the term type (constant, variable, arithmetic, function).

Constants have a type and value: constantTerm_type(T_ID, C_TYPE) and constantTerm_value(T_ID, VAL) where C_TYPE is either symbol for a string represnting a symbolic constant, integer for an integer constant, string for a string constant, or object(JAVA_TYPE) for a Java object, with JAVA_TYPE represeting the value of term.getObject().getClass().getName(). The value VAL of a term is always the string representation of the respective constant.

Variable Terms are encoded using their variable symbol as a string: variableTerm_symbol(T_ID, SYM).

Arithmetic Terms are reified as:

arithmeticTerm_leftTerm(T_ID, LT_ID).
arithmeticTerm_rightTerm(T_ID, RT_ID).
arithmeticTerm_operator(T_ID, OP). % OP = string representation of arithmetic operator: "+", "-", etc.

Function Terms are encoded using atoms functionTerm_symbol(T_ID, SYM), functionTerm_numArguments(T_ID, NUM) and one atom functionTerm_argumentTerm(T_ID, IDX, ARG_ID) per argument term.

Facts

Facts are encoded as regular atoms (i.e. using atom_type, etc predicates), but in addition result in an atom fact(<atomId>), denoting that the atom with id <atomId> is a fact.

Example

Consider the following program

p(1).
p(2).
p(3).

q(Z) :- Z = X + Y, p(X), p(Y), Y != X.
qs(C) :- C = #count{ X : q(X) }.

foobar(FB) :- &stdlib_string_concat["foo", "bar"](FB).

The reified version of this program is: (comments inserted manually, not generated by Alpha)

% p(1).
fact(0).
atom_type(0, basic).
basicAtom_predicate(0, 1).
basicAtom_numTerms(0, 1).
basicAtom_term(0, 0, 2).
term_type(2, constant).
constantTerm_type(2, "integer").
constantTerm_value(2, "1").

% p(2).
fact(3).
atom_type(3, basic).
basicAtom_predicate(3, 1).
basicAtom_numTerms(3, 1).
basicAtom_term(3, 0, 4).
term_type(4, constant).
constantTerm_type(4, "integer").
constantTerm_value(4, "2").

% p(3).
fact(5).
atom_type(5, basic).
basicAtom_predicate(5, 1).
basicAtom_numTerms(5, 1).
basicAtom_term(5, 0, 6).
term_type(6, constant).
constantTerm_type(6, "integer").
constantTerm_value(6, "3").

% q(Z) :- Z = X + Y, p(X), p(Y), Y != X.
rule(7).
rule_head(7, 8).

%% Normal head "q(Z)"
head_type(8, normal).
normalHead_atom(8, 9).
atom_type(9, basic).
basicAtom_predicate(9, 10).
basicAtom_numTerms(9, 1).
basicAtom_term(9, 0, 11).
term_type(11, variable).
variableTerm_symbol(11, "Z").

%% Rule body
rule_numBodyLiterals(7, 4).

%% Body literal "Z = X + Y"
rule_bodyLiteral(7, 12).
literal_polarity(12, pos).
literal_atom(12, 13).
atom_type(13, comparison).
comparisonAtom_leftTerm(13, 14).
comparisonAtom_rightTerm(13, 15).
comparisonAtom_operator(13, eq).
term_type(14, variable).
variableTerm_symbol(14, "Z").
term_type(15, arithmetic).
arithmeticTerm_leftTerm(15, 16).
term_type(16, variable).
variableTerm_symbol(16, "X").
arithmeticTerm_rightTerm(15, 17).
term_type(17, variable).
variableTerm_symbol(17, "Y").
arithmeticTerm_operator(15, "+").

%% Body literal "p(X)"
rule_bodyLiteral(7, 18).
literal_polarity(18, pos).
literal_atom(18, 19).
atom_type(19, basic).
basicAtom_predicate(19, 1).
basicAtom_numTerms(19, 1).
basicAtom_term(19, 0, 20).
term_type(20, variable).
variableTerm_symbol(20, "X").

%% Body literal "p(Y)"
rule_bodyLiteral(7, 21).
literal_polarity(21, pos).
literal_atom(21, 22).
atom_type(22, basic).
basicAtom_predicate(22, 1).
basicAtom_numTerms(22, 1).
basicAtom_term(22, 0, 23).
term_type(23, variable).
variableTerm_symbol(23, "Y").

%% Body literal "Y != X"
rule_bodyLiteral(7, 24).
literal_polarity(24, pos).
literal_atom(24, 25).
atom_type(25, comparison).
comparisonAtom_leftTerm(25, 26).
comparisonAtom_rightTerm(25, 27).
comparisonAtom_operator(25, ne).
term_type(26, variable).
variableTerm_symbol(26, "Y").
term_type(27, variable).
variableTerm_symbol(27, "X").

% qs(C) :- C = #count{ X : q(X) }.
rule(28).
rule_head(28, 29).

%% Rule head "qs(C)"
head_type(29, normal).
normalHead_atom(29, 30).
atom_type(30, basic).
basicAtom_predicate(30, 31).
basicAtom_numTerms(30, 1).
basicAtom_term(30, 0, 32).
term_type(32, variable).
variableTerm_symbol(32, "C").

%% Rule body
rule_numBodyLiterals(28, 1).

%% Body literal "C = #count{ X : q(X) }"
rule_bodyLiteral(28, 33).
literal_polarity(33, pos).
literal_atom(33, 34).
atom_type(34, aggregate).
aggregateAtom_aggregateFunction(34, count).
aggregateAtom_leftHandOperator(34, eq).
aggregateAtom_leftHandTerm(34, 35).
aggregateAtom_numAggregateElements(34, 1).
aggregateAtom_aggregateElement(34, 36).
aggregateElement_numTerms(36, 1).
aggregateElement_term(36, 0, 37).
term_type(37, variable).
variableTerm_symbol(37, "X").
aggregateElement_numLiterals(36, 1).
aggregateElement_literal(36, 38).
literal_polarity(38, pos).
literal_atom(38, 39).
atom_type(39, basic).
basicAtom_predicate(39, 10).
basicAtom_numTerms(39, 1).
basicAtom_term(39, 0, 40).
term_type(40, variable).
variableTerm_symbol(40, "X").

% foobar(FB) :- &stdlib_string_concat["foo", "bar"](FB).
rule(41).
rule_head(41, 42).

%% Rule head "foobar(FB)"
head_type(42, normal).
normalHead_atom(42, 43).
atom_type(43, basic).
basicAtom_predicate(43, 44).
basicAtom_numTerms(43, 1).
basicAtom_term(43, 0, 45).
term_type(45, variable).
variableTerm_symbol(45, "FB").

%% Rule body
rule_numBodyLiterals(41, 1).

%% Body literal "&stdlib_string_concat["foo", "bar"](FB)"
rule_bodyLiteral(41, 46).
literal_polarity(46, pos).
literal_atom(46, 47).
atom_type(47, external).
externalAtom_name(47, "stdlib_string_concat").
externalAtom_numInputTerms(47, 2).
externalAtom_inputTerm(47, 0, 48).
term_type(48, constant).
constantTerm_type(48, "string").
constantTerm_value(48, "foo").
externalAtom_inputTerm(47, 1, 49).
term_type(49, constant).
constantTerm_type(49, "string").
constantTerm_value(49, "bar").
externalAtom_numOutputTerms(47, 1).
externalAtom_outputTerm(47, 0, 50).
term_type(50, variable).
variableTerm_symbol(50, "FB").

%% Predicate id map
predicate(10, "q", 1).
predicate(44, "foobar", 1).
predicate(31, "qs", 1).
predicate(1, "p", 1).

madmike200590 commented 2 years ago

Current status: Programs can be reified and written to stdout using CLI flag -reify, next steps:

unit tests
refine (and document) reified program format
add safety check

codecov[bot] commented 2 years ago

Codecov Report

Base: 70.58% // Head: 66.83% // Decreases project coverage by -3.74% :warning:

Coverage data is based on head (28ab2bb) compared to base (b8af1d7). Patch coverage: 13.93% of modified lines in pull request are covered.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #332 +/- ## ============================================ - Coverage 70.58% 66.83% -3.75% + Complexity 2147 2108 -39 ============================================ Files 181 184 +3 Lines 8049 8379 +330 Branches 1432 1464 +32 ============================================ - Hits 5681 5600 -81 - Misses 1994 2411 +417 + Partials 374 368 -6 ``` | [Impacted Files](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | Coverage Δ | | |---|---|---| | [...r/alpha/app/mappers/AnswerSetToWorkbookMapper.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY2xpLWFwcC9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9hcHAvbWFwcGVycy9BbnN3ZXJTZXRUb1dvcmtib29rTWFwcGVyLmphdmE=) | `95.83% <ø> (ø)` | | | [...t/ac/tuwien/kr/alpha/commons/AnswerSetBuilder.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL0Fuc3dlclNldEJ1aWxkZXIuamF2YQ==) | `68.62% <ø> (ø)` | | | [.../at/ac/tuwien/kr/alpha/commons/BasicAnswerSet.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL0Jhc2ljQW5zd2VyU2V0LmphdmE=) | `26.08% <ø> (ø)` | | | [...ommons/comparisons/AbstractComparisonOperator.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL2NvbXBhcmlzb25zL0Fic3RyYWN0Q29tcGFyaXNvbk9wZXJhdG9yLmphdmE=) | `0.00% <ø> (ø)` | | | [...alpha/commons/comparisons/ComparisonOperators.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL2NvbXBhcmlzb25zL0NvbXBhcmlzb25PcGVyYXRvcnMuamF2YQ==) | `0.00% <ø> (ø)` | | | [...kr/alpha/commons/programs/ASPCore2ProgramImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL0FTUENvcmUyUHJvZ3JhbUltcGwuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...ien/kr/alpha/commons/programs/AbstractProgram.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL0Fic3RyYWN0UHJvZ3JhbS5qYXZh) | `0.00% <ø> (ø)` | | | [...r/alpha/commons/programs/InlineDirectivesImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL0lubGluZURpcmVjdGl2ZXNJbXBsLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...n/kr/alpha/commons/programs/NormalProgramImpl.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL05vcm1hbFByb2dyYW1JbXBsLmphdmE=) | `0.00% <0.00%> (ø)` | | | [.../ac/tuwien/kr/alpha/commons/programs/Programs.java](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp#diff-YWxwaGEtY29tbW9ucy9zcmMvbWFpbi9qYXZhL2F0L2FjL3R1d2llbi9rci9hbHBoYS9jb21tb25zL3Byb2dyYW1zL1Byb2dyYW1zLmphdmE=) | `0.00% <0.00%> (ø)` | | | ... and [124 more](https://codecov.io/gh/alpha-asp/Alpha/pull/332?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=alpha-asp)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

madmike200590 commented 2 years ago

TODO: We should map constant term types to "ASP-types" integer, string, symbol, object rather than just writing java types (see instances of constantTerm_class/2 in reified code)

AntoniusW commented 2 years ago

@madmike200590 Working with the reification a bit, I noticed that some additional information on terms (and probably atoms) is always needed. Deriving this on the reified facts is possible, but I think it needlessly slows down any evaluation as the information is already stored in the internal representation and computing it based on the reified facts involves a lot of join-evaluations.

What I think should be part of every reification:

Whether a term is ground. For example with term_is_ground(T).
How many subterms a function term has. For example with functionTerm_numArguments(FT,Num).

The following also may be useful:

Whether an atom is ground. For example with atom_is_ground(A).

madmike200590 commented 2 years ago

@AntoniusW is there anything you think needs improving and/or changing on this? Otherwise I'd like to go ahead and merge.

AntoniusW commented 2 years ago

@AntoniusW is there anything you think needs improving and/or changing on this? Otherwise I'd like to go ahead and merge.

I haven't had the time to review the code. Will do so in the next days and I assume that most of it is fine already.

AntoniusW commented 2 years ago

Since the PR description is a very good documentation of the reification itself already, I suggest to add a clear explanation of what happens with facts in the input program. The example covers the case of a rule and from the description it is not clear whether facts are represented just like rules (i.e., with a rule_head and a normalHead_atom fact), or by some shorthand, which seems to be the case (i.e., for a fact we get fact(ID) and then atom_type(ID,basic) and then basicAtom_... but no rule_head etc).

AntoniusW commented 1 year ago

Working a bit more with the reification I noticed that every occurrence of each and every term gets its own id. This makes writing rules extremely cumbersome as, for example, all occurrences of the same variable X in a rule receive different term ids. The behavior I expected, is already implemented for predicates (i.e., a lookup-table of ids for known predicates). Is there a particular reason why for terms this approach was not chosen?

madmike200590 commented 1 year ago

Working a bit more with the reification I noticed that every occurrence of each and every term gets its own id. This makes writing rules extremely cumbersome as, for example, all occurrences of the same variable X in a rule receive different term ids. The behavior I expected, is already implemented for predicates (i.e., a lookup-table of ids for known predicates). Is there a particular reason why for terms this approach was not chosen?

As per our discussion on that topic, this leads to a not-so-straightforward question of whether a term id refers to a term symbol or the abstract object the term stands for. I went with the former because it seemed more intuitive to me. I'd like to keep it as-is in order to finally get this feature finished.

madmike200590 commented 1 year ago

@AntoniusW I've tried to add some more test cases for at least most basic program constructs. These are far from exhaustive, but still took a long time to write (since programmatically verifying reification results is rather involved). In order to move forward with this feature, I'd like to get it merged as it is now. Even if there should still be some bugs lurking, we won't be breaking anything, given that reification is for now an experimental feature and nothing depends on it.

alpha-asp / Alpha

Reification of input programs #332

Program reification

Reification of ASP language constructs

Programs

Rule Heads

Constraints

Rule Bodies

Literals

Atoms

Terms

Facts

Example

Codecov Report