adl / hoaf

Hanoi Omega-Automata Format
13 stars 2 forks source link

better wording in case States: is not given #39

Closed adl closed 9 years ago

adl commented 9 years ago

Consider this (IMHO incorrect) input:

HOA: v1
Start: 6
Acceptance: 0 t
--BODY--
--END--

The specifications currently say:

States should be numbered from 0 to n−1 (where n is the value given by the States: header item if present)

and then

States should be numbered from 0 to n−1, may be listed in any order, but should all be listed (i.e., if the header has States: 10 then the body should have ten State: INT statements, with all numbers from 0 to 9).

but that does not really specifies what is expected if the optional States: header is missing, since n is then undefined. My suggestion would be to fix those sentences so that n denotes the value of the States: header if present, or the highest state number used as an initial state or as a destination of any transition.

strejcek commented 9 years ago

The last part of the last sentence, i.e.

...or the highest state number used as an initial state or as a destination of any transition.

says that we do not want to count a state that has the highest number if it is neither initial nor target of any transition? These states are useless, but legal and they should not be simply ignored IMHO.

Hence, if States: header is missing, than n is the highest state number listed in the automaton body or used as an initial state or in description of some transition target.

I also suggest to relax from the condition saying that every state between 0 and n-1 should be explicitly listed. I would replace it by assumption that states that are not listed have no outgoing transitions and empty label (test for automaton completeness has to be adapted for that).

adl commented 9 years ago

These states are useless, but legal and they should not be simply ignored IMHO.

I missed those. Good point.

Hence, if States: header is missing, then n is the highest state number listed in the automaton body or used as an initial state or in description of some transition target.

Agreed.

I also suggest to relax from the condition saying that every state between 0 and n-1 should be explicitly listed. I would replace it by assumption that states that are not listed have no outgoing transitions and empty label

I'm ambivalent about this. What you suggest is actually how I'm handling unlisted states in Spot. jhoafparser is stricter and will diagnose unlisted states. But I somehow feel that the strictness is better: I can't imagine a scenario where it would be helpful for a tool to output a transition to a state, and then not declare that state, so if this happens I'd prefer to be warned.

One case I can imagine where it would be useful to not declare a state, is if you have an operation that remove states from an automaton, and yet you do not want to change the numbers of your states. Then you would declare an automaton with more states than you actually use, and then (without the relaxed condition that you suggested) you would have to list all the removed states with no successor. As it turns out, this is exactly how autfilt --keep-states=0,2,5,... was working a few weeks ago (it would only keep the desired states, unconnect all the other states, from the automaton, and list the latter without successor in the HOA output). Since then I fixed it to really remove the unused states from the output automaton, compacting the structure by renumbering all states.

Also if we don't require the states to be listed explicitly, I'm afraid tools authors will be more likely to declare more states than needed. For instance if I'm building a product A*B, I could declare the maximum number of states |A|*|B|, and then list only the states that are actually reachable. Doing so would completely change the way you process automata on input: suddenly it feels like we should not store states in a (dense) vector, but in some sparse data structure. I'd prefer if we can assume that states are consecutive elements in a vector, it's easier for tools that consume the format.

strejcek commented 9 years ago

The scenario leading me to suggestion of "missed states" is like this: one manually edits an automaton in such a way that the it has less states: do we want to push them to rename all the states or do we allow skipping state numbers?

My feeling is that skipping states make the format more robust. But I'm not going to fight for that, it is not that important.

kleinj commented 9 years ago

I'm sympathetic to both points of view, on the one hand making the life of the consumer easier, on the other hand giving the producer more flexibility, especially when doing manual editing. Having such structural sanity checks however is very useful to identify broken or suspect output by tools.

One option would be to use the n = highest state index that occurs anywhere approach, with the states that are not explicitely listed having no successors etc, and provide a property, e.g. "compact", that signifies that the stricter interpretation (all states have to be defined) should apply. It would then be relatively easy to provide transformations in autfilt and jhoafparser to compactify a given automaton, renaming the states to satisfy the property. Tools that expect to generate compact output on the other hand would benefit from the warnings/errors that are raised when they specify the property while having broken output. We could then add language to the format spec to "encourage tools to produce compact output".

adl commented 9 years ago

I fear that adding one more "mode" (dense / sparse) to the format is just making it more complex to implement and specify. This point is not very complex in itself, but the accumulation of all these little mode tweaks (state-based versus transitions-based acceptance, explicit labels vs. implicit labels, implicit number of states versus explicit number of states, etc.) builds the complexity.

How about just applying Postel's law: we have a strict format that tools should strive to follow in their output, but this should not prevent us from writing parsers that are more lax in what they input. How lax we are is not something that the format should specify (otherwise that is not "lax" anymore, plus we do not want to encourage producer tools to be lax too). This is already the case in Spot as I have done a lot of work on error recovery (which is especially important when handling a stream of automata, as you don't want to just bail out at the first error): you can input automata with some syntax errors, autfilt will diagnose them, and if it can recover from the error in some way it will continue using these (maybe partial) automata, and you can have them output in (syntactically correct) HOA if you ask.

So in my view of the "human edition scenario" is already supported:

If the human editor adds new states to the automaton and forget to update the "States:" line, it will be diagnosed and fixed too; this already works too. I think the human editor can live with a couple of warnings, and should be happy to see them fixed automatically; while the author of a producer tool should just fix his tool so the warnings disappear.

strejcek commented 9 years ago

I like this solution. Good work.

adl commented 9 years ago

OK, actual proposal in #41.

adl commented 9 years ago

FYI, here is the actual status with Spot regarding the "human-editor scenario".

Assume the following automaton is the result of the (quick and dirty) human edition:

% nl foo.hoa
     1  HOA: v1
     2  States: 8
     3  Start: 0
     4  AP: 2 "p0" "p1"
     5  acc-name: all
     6  Acceptance: 0 t
     7  properties: trans-labels explicit-labels state-acc
     8  --BODY--
     9  State: 0
    10  [!0&!1] 8
    11  [!0&!1] 4
    12  State: 1
    13  [0&!1] 2
    14  [!0&!1] 7
    15  [!0&1] 4
    16  State: 3
    17  [!0&1] 6
    18  State: 4
    19  [0&!1] 1
    20  [!0&1] 2
    21  [0&1] 3
    22  [!0&1] 7
    23  State: 8
    24  [!0&1] 0
    25  --END--

You can either pass it to autfilt to add the undefined states and fix the state count:

% autfilt -H foo.hoa
foo.hoa:10.9: state number is larger than state count...
foo.hoa:2.1-9: ... declared here.
foo.hoa:23.8: state number is larger than state count...
foo.hoa:2.1-9: ... declared here.
foo.hoa:13.8: state 2 has no definition
foo.hoa:17.8: state 6 has no definition
foo.hoa:14.9: state 7 has no definition
HOA: v1
States: 9
Start: 0
AP: 2 "p0" "p1"
acc-name: all
Acceptance: 0 t
properties: trans-labels explicit-labels state-acc
--BODY--
State: 0
[!0&!1] 8
[!0&!1] 4
State: 1
[0&!1] 2
[!0&!1] 7
[!0&1] 4
State: 2
State: 3
[!0&1] 6
State: 4
[0&!1] 1
[!0&1] 2
[0&1] 3
[!0&1] 7
State: 5
State: 6
State: 7
State: 8
[!0&1] 0
--END--

Or if the missing states are the result of some removal, you probably want to remove all dead states (i.e., those that cannot be part of an infinite run):

% autfilt -H --remove-dead foo.hoa
foo.hoa:10.9: state number is larger than state count...
foo.hoa:2.1-9: ... declared here.
foo.hoa:23.8: state number is larger than state count...
foo.hoa:2.1-9: ... declared here.
foo.hoa:13.8: state 2 has no definition
foo.hoa:17.8: state 6 has no definition
foo.hoa:14.9: state 7 has no definition
HOA: v1
States: 4
Start: 0
AP: 2 "p0" "p1"
acc-name: all
Acceptance: 0 t
properties: trans-labels explicit-labels state-acc
--BODY--
State: 0
[!0&!1] 3
[!0&!1] 2
State: 1
[!0&1] 2
State: 2
[0&!1] 1
State: 3
[!0&1] 0
--END--