adl / hoaf

Hanoi Omega-Automata Format
13 stars 2 forks source link

Formalize the semantics of the version number #56

Closed adl closed 7 years ago

adl commented 8 years ago

If we are going to release another version of the format, we should decide and specify how the version number is used.

I would suggest that the syntax for version X.Z should be a superset of the syntax for version X.Y whenever ZY. This means a parser for v1.1 should be able to parse anything written for v1 (which I interpret as v1.0). But the opposite direction would not have to hold. The introduction of negated properties discussed in #55 can be done with the ! syntax if we bump the version number, and document that no-univ-branch is an obsolete name for !univ-branch.

I would increment X for changes that invalidate the actual syntax.

In parser for version X.Y, I would try to parse even files written with version X.Z for any Z, in case it is in the subset of the syntax I understand. But when the parser finds a syntax error in a file with Z>Y, I would additionally complain about the unsupported version number.

strejcek commented 8 years ago

I fully agree with Alexandre's suggestions.

I would be even a bit strict about no-univ-branch: I suggest to add a note that no-univ-branch is not only obsolete, but will not be supported in version 2.0 (if all we agree on this). Let's keep one universal one solution for negations.

adl commented 8 years ago

By the way, the notation HOA: v1.1 is actually incompatible with the grammar we defined.

We have format-version ::= "HOA:" IDENTIFIER where identifiers should match [a-zA-Z_][0-9a-zA-Z_-]*.

So either we use another numbering scheme (like HOA: v1_1 or HOA: v1a) or we can change the definition of IDENTIFIER to [a-zA-Z_][0-9a-zA-Z_.-]*. The dot is not used anywhere in the format, so allowing it for any identifier should not be a problem. If we allow dot inside identifiers, it might make sense to allow it in alias names and header names as well.

xblahoud commented 8 years ago

I agree with Alexandre's solution. If there is no conflict of the dot anywhere, I support the extension of the identifier's range as well as those of header names and aliases.

kleinj commented 8 years ago

Alexandre's scheme looks good. I'm a bit ambivalent about generally allowing . in IDENTIFIER, perhaps we need that later on for something else...

HOA: v1_1 looks good to me. Another alternative would be HOA: "v1.1", but that is not backward compatible either.

strejcek commented 8 years ago

Backward compatibility is not crucial: as Alexandre suggested at the beginning of this discussion, it is fine if the version 1.1 extends the syntax of the version 1. The important direction is the other, i.e. v1.1 should subsume the whole v1.

adl commented 8 years ago

My suggestion is to allow . inside IDENTIFIER, ANAME, and HEADERNAME at all the places where we already allow - (so never as a first character). The motivation is mostly cosmetic (v1.1 is less cryptic than v1_1), but also, I look forward to using some custom headers in Spot in the future, and being able to prefix custom headers with spot. looks quite natural to me. Consider:

spot.highlight-states: 1 3 5

I not do think it will prevent us to use . in other contexts (numbers, I assume) in the future.

Yes, using v1.1 will break parsers for the v1 format that do not support . in identifiers, but I don't think it matters at this point:

kleinj commented 8 years ago

Ok, having the possibility of doing prefixes for custom headers make sense. I'm fine with the . changes and will prepare jhoafparser and cpphoafparser accordingly.

adl commented 8 years ago

Pushed some wording for that.

kleinj commented 8 years ago

Thanks. I think I'd prefer a stricter syntax for the version numbering, i.e., something in the sense of

The definition via strverscmp() seems a bit opaque to me. Do we want to have the additional flexibility/complexity? If I understood it correctly, that would allow adding arbitrary suffixes to the version that don't have semantic significance? I guess for program version that makes sense (v0.99-beta, etc), but for the format?

adl commented 8 years ago

I don't know if we really need more flexibility. If we judge by the fact that we did not foresee we would want to release v1.1 when we designed v1, maybe we should not assume that we will only need versions of the form vX.Y. Maybe at some point we will want vX.Y.Z, or maybe some people will want to introduce some forks as v1.1-myfork-1. Or maybe not. No clue.

So my reasoning was that we need to be strict on the major number, because that is what we use to invalidate the format, but I could not find any reason to be strict on a minor number: as long as we can order two versions that share the same major number, that sounds good to me.

But I really do not have a strong opinion about it.

I should confess that my main motivation is not flexibility, but simplicity & laziness: using strverscmp() to order two identifiers is just easier than having to make sure that the version number has one of two specified forms, and having to add diagnostics (and test cases) for the case where it is invalid. So, from my point of view, imposing vX.Y is more work. But I'll agree it's not insurmountable, so if you really want to push for vX.Y please just make the change. Also I suspect that strverscmp() has no equivalent in Java (?) so what is less work for me might be more work for you...

kleinj commented 8 years ago

Indeed, there is no Java implementation of strverscmp()... Perhaps the following would make sense to have some added flexibility, and would match our mechanism for the custom properties:

The semantics would be as proposed, with the additional restriction that Z starting with an uppercase letter would also indicate an incompatible semantics change. I.e., if the parser knows how to deal with v1.1 it could also safely read v1.1-fork but not v1.1-Fork.

adl commented 8 years ago

with the additional restriction that Z starting with an uppercase letter would also indicate an incompatible semantics change. I.e., if the parser knows how to deal with v1.1 it could also safely read v1.1-fork but not v1.1-Fork.

So if v1.1-Fork cannot be parsed by anybody who does not know about the semantic of v1.1-Fork, what is the point of calling it v1.1-Fork over just vFork? That is already rejected by all tools that do not support it.

In fact we already have many options for announcing incompatibie changes, so I'm not sure why we need another one.

For most semantic changes, I think the preferred protocol should be to introduce (and use) a header name with an uppercase letter (as we did for Alphabet:). In case of radical change (e.g. new syntax), one can bump the major version number.
And if people want their own namespace for version number, they can also use a version number that is incompatible with the specified format for version number. For instance HOA: myown-v1 will already be rejected by our tools.

With your vX, vX.Y proposal, even if we include the vX.Y-Z case (but without the suggested uppercase semantic), one already has another way to declare an incompatible semantic by using an invalid version number. E.g. v1.experimental.1,

I'd rather keep things simple and not specify/implement such kind of additional check.

strejcek commented 8 years ago

I agree with Alexandre: the suggested solution with strverscmp() seems to be reasonably simple (and I would write "very simple" if there is a Java implementation of strverscmp()) and very flexible.

xblahoud commented 8 years ago

I was thinking now about the following statement from the current proposal.

Given two versions $X\le Y$ that share the same major version number, then the format specified by version $Y$ should be a superset of the format specified for version $X$. This means that a parser for version $Y$ should be able to read any file written using version $X$.

It can happen that somebody introduces a fork v1.01 which would not be a subset of v1.1 and with that the cited statement would not hold any more. If I understand it correctly, it also imposes the ordering on different named forks as v1.1-myFork and v1.1-anotherFork. Is this what we want?

adl commented 8 years ago

Indeed, it's quite unclear. So for the sake of getting 1.1 out, should we use Joachim's suggestion? That's fine with me.

kleinj commented 8 years ago

That would be fine with me. I'll prepare some text for the specification.

xblahoud commented 8 years ago

I am fain with that suggestion, but would like to have opportunity to have some other suffixes. If someone would like to adjust our format to his specific needs, it makes sense to have suffixes (for weights, let's say).

adl commented 8 years ago

I am fine with that suggestion, but would like to have opportunity to have some other suffixes. If someone would like to adjust our format to his specific needs, it makes sense to have suffixes (for weights, let's say).

So maybe just state that the format number can be any identifier. Identifiers of the form X or X.Y have the semantic we discussed. Other identifiers have unspecified semantics.

xblahoud commented 8 years ago

I agree.