Support for non-AP alphabets

Heizmann commented 8 years ago

In most cases I use automata whose alphabet is not the powerset of some set. Hence in HOA my transitions look like [0 & !1 & !2 & !3 & ...]

Jan Strejcek already explained me that I can use aliases for large formulas. This already reduces the file size drastically but basically only shifts the problem from the body to the header.

My main application is an automata-based approach to software verification. Without optimizations we have there one letter in the alphabet for each statement in the program that we want to verify. Even with optimizations, we often have hundreds of letters in our alphabet.

kleinj commented 8 years ago

Thanks for your interest in the HOA format!

We have encountered situations as well where the AP-based alphabet is akward to use and are looking into adding support for explicitly specifying the alphabet to the HOA format as an alternative.

In the meantime, you might try using a binary encoding of your alphabet using atomic propositions. Something like this:

Let's say you have the alphabet a,b,c, so you need to 2 bits/APs to encode it.

AP: 2 "0" "1"
Alias: @a !0&!1
Alias: @b  0&!1
Alias: @c !0& 1
...
State: 0
[@a]  successor_state
[@a | @b] successor_state

The 'letter' 0&1 would be unused.

Depending on your needs, that might be enough to get some working tool support. But you have to take care that the "unused" letters in the induced alphabet powerset(AP) don't cause semantic problems, e.g. the automaton might be considered as incomplete when it would be complete in the original alphabet and some automata analyses / constructions might give incorrect results.

adl commented 8 years ago

only shifts the problem from the body to the header

I think the problem is still in the body as well. For instance the complement of the label [0 & !1 & !2] is the label [!0 | 1 | 2], which actually allows letters such as !0&!1&!2 or 0&1&2 which make no sense in your setup. A tool such as ltl2dstar, which can compute the complement of a Büchi automaton in the HOA format, would produce a complement automaton with a different language than the one for the complement your tool would generate with the above encoding.

Unless your alphabet has size 2^n and letters are binary-encoded using n arbitrary atomic propositions, I see no way to have perfect correspondence between the two semantics.

One idea I remember discussing with someone (Jan S tells me it was not him, so I suspect it was Joachim Klein) was to introduce support Alphabet: as an alternative to AP: in the format. Fixing the format to support both semantics seems quite easy: the semantic of automata using AP: can be presented as a particular cases of Alphabet:-based automata.

However updating the existing AP:-based tools to also support Alphabet: may be tricky and probably inefficient, so it's not clear if it would happen. For Spot it would for at least require an automatic re-encoding on input, some sanitizing at the end of several algorithms (combinations of APs that do not correspond to any input letter), some obvious decoding on output, and deciding what to do in several weird cases (like: does it make sense to build the product of an AP:-based automaton and an Alphabet:-based automaton?).

That said, I still think it would be better for the format to support Alphabet: even if some tools can only work with AP:.

kleinj commented 8 years ago

I opened issue #54 for a technical discussion of adding such a Alphabet header.

strejcek commented 8 years ago

This issue in now replaced by #54 .

adl / hoaf

Support for non-AP alphabets #53