JavaModelingLanguage / RefMan

4 stars 0 forks source link

JMLDataGroup #49

Open davidcok opened 2 years ago

davidcok commented 2 years ago

With the decision to use names beginning with backslashes for built-in collection types (e.g. \set, \seq, \locset), the names of all built-in Java types begin with a backslash except JMLDataGroup. Do we want to keep that one as is or move to a backslash name, like \datagroup ? (JMDataGroup would still exist as the RAC representation of the type).

It is conceivable that one could use \locset here also, though that means melding the expression-style definition of \locset values with the 'in' and 'maps' static style that is currently in JML, courtesy of Leavens and Mueller's theoretical work on sound use of data groups.

mattulbrich commented 2 years ago

What is the meaning of \datagroup? I understand that is the same as \locset.

I am in favour of investigating how "in" etc. can be used to specify using \locset. That could read nice!

leavens commented 2 years ago

Having a location (e.g., a field) in a datagroup means that whenever that datagroup is mentioned in an assignable clause, then all the locations in that datagroup can be assigned during a method's execution. The idea is based on a paper by Leino:

K. Rustan M. Leino. Data groups: Specifying the modification of extended state. OOPSLA '98 Conference Proceedings. (ACM SIGPLAN Notices, 33(10):144-153, October 1998)

The details for JML were described in https://www.cs.ucf.edu/~leavens/JML/jmlrefman/jmlrefman_10.html#SEC121 .

The key points in the semantics are as follows. Locations must be added to a datagroup when they are declared (using in clauses), so that if the location is visible then so is its datagroup membership. "Each location (field or array element) in a program defines a data group, whose name is the same as that of the field or array element." Furthermore, "JML requires users to put fields that are used to compute the value of a model field (see section 8.4 Represents Clauses) into the data group of that model field."

However, the built in type org.jmlspecs.lang.JMLDataGroup is not a location set, it is supposed to be a type that has exactly one non-null object (named JMLDataGroup.IT), and so it can be used as an extreme abstraction ("to talk about "the state" of an object without committing to any modeling or representation details"). Thus, datagroups are like location sets but the type JMLDataGroup is not at all like a location set (it is just a type with exactly one element).

mattulbrich commented 2 years ago

I see. Thanks for the explanation.

I think I like the "in" syntax and I like it better than our current syntax, so:

//@ ghost \locset footprint;
private int f; //@ in footprint;
private Node next; //@ in footprint;
//@ maps next != null ==> next.footprint into footprint;
private int localField; //@ in someOtherLocset;

reads better than

//@ ghost \locset footprint;
private int f;
private Node next;
private int localField;
//@ invariant footprint = \locset(this.f, this.next) + (next != null ? next.footprint : \empty);
// ...
}

The latter could be the desugared version of the former. I agree that "in" is nice syntax.

[That "also" bit was an ad hoc syntax extension ...] [Edited to use "map into" of which I wasn't really aware before.]

davidcok commented 2 years ago

So Gary -- do you want to keep the JMLDataGroup name, or move toward a -style name?

mattulbrich commented 2 years ago

JML requires users to put fields that are used to compute the value of a model field (see section 8.4 Represents Clauses) into the data group of that model field."

I don't think that coupling is particularly sensible, since most of the time you have a number of queries, model fields, model methods that all depend on the same set and mutators that write on the same set.

So, I'd suggest to deprecate that implicit coupling in favour of flexible names, like

//@ accessible footprint;
//@    model int modelMethod() { ... }

One can of course still have one datagroup per model field; but that feels unnatural. You'd have to list 5 "in" things if you have 5 model fields ...

mattulbrich commented 2 years ago

From the reference manual:

It is sometimes convenient to declare a data group without any other information about the model of data. This can be done using the type org.jmlspecs.lang.JMLDataGroup. This type has exactly one non-null object, named JMLDataGroup.IT.

I think that also underlines that the implcit coupling between model field and datagroup is a bit too narrow. I'd suggest to deprecate JMLDatagroup since it is only a dummy unit type.

leavens commented 2 years ago

I think we should keep JMLDataGroup as a type that we define, it's pretty simple after all.

The question of how JMLDataGroup fits in with the semantics of JML and its checks is more interesting. If we can create an alternative that is better than datagroups for managing the specification of effects, than that would be great and I would be happy to help with that. I agree with Mattias that datagroups are not as flexible as might be desired, and I would think that a similar concept based on dynamic framing would be more flexible, but let's work that out in detail before phasing out the semantics of datagroups.

davidcok commented 2 years ago

In my practical use

mattulbrich commented 2 years ago
So it [=JMLDatagroup] could be deprecated in favor of using Object or just kept or
renamed to have a name that is consistent with other builtin names
(e.g. \datagroup or \locset)

I would suggest we go a step further and go beyond having the group as an opaque entity of a unit type, and actually say they are sets of locations. I'd naturally favour \locset, but datagroup or region fit as well.

As far as I see it a datagroup specifies a \locset; it just does it
statically. So using a \locset as a model field and having locations
added to it via 'in' and 'maps' clauses makes sense to me)

From my practical experience: It is WAY easier to prove things if ghost fields are used for footprints. Conceptually model fields are possible too, however.

Datagroups work nicely (as they were intended) for abstraction of
side effects/frame conditions within a class hierarchy.

Yes. Just like regions/dynamic frames they hide the extension of the memory parts on which a data structure operates.

So far, even when specifying recursive data structures, I have been
able to do so with 'in' and 'maps' and not needed the operations
(e.g. union) on \locsets that we have earlier discussed. But my
experience is very limited so far. However it does make me wonder
whether KeY would have moved in the direction it did if it had
realized the semantics of JML's data groups.

I do remember having discussed the differences between datagroups and dynamic frames. I think that datagroups are more statically determined

If there is a new more expansive design I'd like to see it combine
somehow the clarity and declarativeness (and ease of checking) of
JML's current static data groups with something more dynamic. The
current data groups have the benefit of a well-worked out
theoretical underpinning.

So do dynamic frames. There is plenty of literature on this (also filed under region logic).

Is there a linked list implementation with datagroups? I'd like to compare.

How about model methods and queries? They also have read clauses. How do they come into the picture.