Implements partial refactor to allow for Boolean & Numeric interop (via `FormulaTerm` wrapper)

mejrpete commented 3 years ago

This PR essentially duplicates https://github.com/aig-upf/tarski/pull/95. The text below is quoted from the original PR (which was submitted as a draft to devel, but is now correctly submitted to the new feature branch rddl-refactor as discussed).

The intention for this partial refactor is to serve as a starting point for work towards a full SMT-like representation which would eliminate the distinction between Predicate and Function (along with Formula and Term) in tarski.

As discussed in #91. This implementation is not the full SMT-like representation that we eventually want. Rather, I implemented a wrapper class FormulaTerm which allows Formula objects to be wrapped as Terms in such a way that Boolean-valued Formula objects can be automatically cast as 0/1 numeric values for arithmetic, etc.

The primary motivation for this refactor is to allow for the common patterns used in RDDL (which commonly includes numeric arithmetic over Boolean-valued state and action fluents).

This implementation

Changes the Boolean sort to be 0/1 valued and a child of the Naturals sort when the Arithmetic theory is included in a language (and a standalone child of Object otherwise)

Adds the FormulaTerm wrapper class, which is used as a container for Formula objects that must be treated as Term objects for arithmetic, etc.

All of Predicate, Formula, Term, Function objects are now associated with a FirstOrderLanguage (previously only Function and Term objects had a language property). Languages are inherited up from "subterms", when, for example, a CompoundFormula is constructed. This is essential for being able to construct FormulaTerm wrappers when needed, since Term objects always need an associated language.

Implements a set of tests (primarily focused on RDDL use cases) that adds to the existing test suite.

While the FormulaTerm wrapper strategy for interop between Term and Formula objects falls short of the full SMT-like representation that we want, I think it is progress in the right direction. Additionally, this strategy, along with a few notable hacks that need to be addressed allowed me to almost entirely preserve the public tarski API (which also allowed for use of the existing test suite during the refactor without a ton of changes).

The following are the key "hacks" that absolutely need to be addressed

Contradiction and Tautology objects now require a FirstOrderLanguage as a parameter when they are constructed. This is due to the fact that Formula objects with Contradiction and Tautology objects as subterms must be able to extract their own FirstOrderLanguage from their subterms. This breaks the common direct use of Tautology and Contradiction where they are not associated with a language. From a mathematical perspective, it is a bit weird to force Tautology and Contradiction to be associated with a FOL, since there is no need formally. This is something that bears more discussion.

As a result of the changes to Tautology, it was no longer possible to construct non-conditional FSTRIPS *Effect objects with Tautology as their condition, as *Effect objects are built outside the scope of a FOL. As a quick hack to make things work, I introduced a Pass class, which behaves exactly as Tautology is supposed to in this case, but without a language. This is a pretty terrible hack, but allowed me to work around the issue without making major API changes to FSTRIPS *Effect construction before this PR was filed.

In order to avoid breaking anything in testing, etc I retained the Boolean theory. It no longer does anything, since the Boolean sort is always attached to a FOL, either as a child of Naturals or Object depending on whether or not there are numeric sorts attached to the language. This is more minor than the above two issues, but is included to motivate a discussion on which API components are flexible in the full SMT-like representation moving forward.

Formula (and related) are no longer hashable, and must be treated the same way as Term objects where you use symref for use in associative containers.

These hacks were included specifically because I wanted to avoid arbitrarily changing the public API for tarski without discussion. The following are the API components that caused difficulty, and I believe (although I may be wrong) will require modification to avoid these kinds of hacks moving forward.

FSTRIPS *Effect construction outside the context of a FOL (while using something like Tautology as a condition if they are not conditional effects)

Tautology and Contradiction objects needing a FOL passed in at construction. (This is a change that was made, but is a little gross).

Finally, I have also included tests that demonstrate appropriate interoperability between Predicate/Formula objects and Function/Term objects through the FormulaTerm wrapper class --- meaning that (as currently tested) numeric interpretation of Boolean-valued objects works as expected for all basic arithmetic, if then else, and sumterm operators. I've also added a large integration test in tests/io/test_rddl_writer.py that uses a significant chunk of the new functionality to generate one of the academic-advising RDDL domain instances used in IPPC-2018 (instance 1).

This refactor is passing all tests that I was able to get working in my local configuration (which includes the base tests and all the benchmark tests). I have not been able to run tests associated with the sdd extra or which require the gringo binary for ASP reachability testing. Finally, I disabled one test related to shorthands for Contradiction and Tautology that no longer makes sense in context of the changes to those classes.

I fully admit that portions of this are really ugly! My hope is that we can discuss what is allowed to change in the API (or find better workarounds than I found) that will allow me to cut out the hacks that were used to make all of this work. Additionally, I still think that moving to a full SMT-like representation is strictly better than the FormulaTerm wrapper strategy used here (although I hope that this version could be used as the starting point for that refactor, since some of the changes --- including those made to the Boolean sort, would need to be made regardless!)

mejrpete commented 3 years ago

@gfrances do you mind if I just merge this into the feature branch? Do you have preferences on a clean merge vs. squash vs. rebase for this one? My gut feeling is that the intermediate commits here have the potential to confuse more than they clarify. That said -- given that it's a separate feature branch perhaps we want to preserve more history and then clean it up when the eventual clean refactor is (eventually) ready to be merged into something like devel down then line?

Just let me know if you have a strong preference either way!

gfrances commented 3 years ago

Thanks @mejrpete! Looking forward to work on this. As for the merge, feel free to proceed with the approach you're happier with, we don't really have any policy about this in the project, but if I had to choose I'd leave the full commit history, it is never more confusing than one single commit :-) Your call!

mejrpete commented 3 years ago

Sounds good to me! Lets at least retain the messages in a squash then. An uninformative merge commit definitely isn't the way to go, but since I pulled in updates from devel (via the new feature branch) that were from after I originally forked, there's no clean way to just rebase the history directly on.

Looking forward to working on this more as well!

aig-upf / tarski

Implements partial refactor to allow for Boolean & Numeric interop (via `FormulaTerm` wrapper) #96