data groups, model fields and *

davidcok commented 2 years ago

In writing l-value expressions (a.k.a. store-ref expressions), JML allows one to write o.f where f is a datagroup (or model field).

For a given object (of dynamic type C) and datagroup f in (usually some super type of) C, f stands for a (fixed) set of fields of C (and its parent types) that have been declared to be 'in' f and to additional l-value expressions that have been 'mapped' into f.

Part of the point is that fields of private visibility can be mapped into data groups of public visibility, so the set of l-values represented by o.f does not depend on visibility.

I think of as in the l-value expression o. as an implicitly declared datagroup in Object to which all fields of an allocated object belong.

As above this would be independent of visibility
For assignable clauses, it does not make sense to include final fields in what is meant by o.*
But for accessible (reads) clauses, it seems that it might -- except that there is no use to knowing that a final field is read, as (after construction) it is just a constant. So perhaps we stick with o.* excluding final fields.

Does this match everyone's understanding? Do we stick with the type name org.jmlspecs.lang.JMLDataGroup? Or another spelling? Or \datagroup?

mattulbrich commented 2 years ago

I assume that l-value and "location" are synonyms, right?

I'd suggest to either 1) deprecate the concept of datagroups in the face of location sets. 2) or use the concept to formulate the reads clauses of model fields.

Point 2) only if that actually matches the expectations.

davidcok commented 2 years ago

Regarding (1) above. We could unify the concepts by allowing \locset as a synonym for JMLDataGroup. Then we can either have computed locusts as in

\locset footprint = \set_union(...)

and 'static' ones using the JML-traditional

class A { //@ \locset state; // was JMLDataGroup }

class B extends A { int value; //@ in state; }

I like that -- it is easily implemented and moves us in a direction of unifying concepts and deprecating the name JMLDataGroup.

Regarding (2). JML already has reads (a.k.a. accessible) clauses. They use \locsets also. OpenJML can check them like it does assignable \locsets (writes clauses). But I have not sorted what importance they are to reasoning.

leavens commented 2 years ago

I'm not sure why you want to deprecate JMLDataGroup, @davidcok . To my mind, location sets and data groups are different (although related) concepts. I also prefer more Java-like names (like JMLDataGroup) instead of backslashed names (like \locset), although I'm okay with backslashed names for mathematical value types.

I think that using data groups to formalize read sets is a good idea that matches the current use of data groups.

mattulbrich commented 2 years ago

Can you please remind me of the meaning of a datagroup? It was a way to say on which fields a model field depends right?

Like

//@ model int x; represents x = y + 1;
int x; //@ in y;

Right?

But how does one deal with more complex dependencies like

class Node {
   //@ model int length; represents length = next != null ? next.length+1 : 0;
   Node next;  //@ in length;
   //@ in next != null ==> next.* in length;
   // ...
}

Or something like that?

davidcok commented 2 years ago

Model fields use data groups, but you can have a datagroup (type JMLDataGroup) without it being a model field. It is a way to abstract the state of objects. So in a parent class one can say assigns state and in a child class one states which fields belong to 'state'.

A model field is both a value (via a represents clause) and a data group (to say what values contribute to that model field's abstract value).

The working out of this with linked data structures and dynamic frames is what I'm working on right now. But the basic static structure with data groups and 'in' clauses was all that was necessary for a large legacy verification project I did last year.

More later when I have time.

mattulbrich commented 2 years ago

Thanks, @davidcok. From what you write, this is what in dynamic frames is called a "footprint", ie. a model entity of type \locset (or similar) holding all elements on which the model field/method may depend. I'd like to see this worked out in datagroups.

I can show you how this works with KeY's dynamic frames. Here we have an explicit ghost field \locset footprint and all relevant methods either have assignabel footprint or accessible footprint, thus limiting their reach. This is very much like your state. However, there is no direct connection between a model field and a footprint, that must be made manually.

So for a List we have (removing the data part, only caring about the footprint):

public interface List {

    //@ public ghost instance \locset footprint;

    //@ public instance invariant \subset(\singleton(this.footprint), footprint);

    // Invariant only ever reads from the footprint.
    //@ public accessible \inv: footprint;

    /*@ public normal_behaviour
      @   accessible footprint;
      @*/
    public /*@pure@*/ int size(); 

    /*@ public normal_behaviour
      @   assignable footprint;
      @   ensures \new_elems_fresh(footprint);
      @*/    
    public void add(Object o);    
}

The ArrayList would then have a coupling invariant:

public class ArrayList implements List {
    private Object[] array;  
    /*@ invariant footprint == \set_union(array[*], this.*);
}

Perhaps this is a helpful example. I'd like to see your vision of a datagroup version of this fragment.

Here is the KeY-verifiable code.

leavens commented 2 years ago

The semantics of datagroups are carefully designed to solve the "extended state problem", which is how to allow subtypes to modify additional fields in a modular way, as described in K. Rustan M. Leino's paper: "Data groups: Specifying the modification of extended state", in OOPSLA '98 Conference Proceedings, October 1998, pp. 144-153, http://doi.acm.org/10.1145/286936.286953 . I can summarize the basic idea as follows. The locations that a method may modify are specified using datagroups, which stand for sets of locations that are extendible in subtypes. The datagroups that a field belongs to must be declared where the field itself is declared, and thus whenever a field is visible, the datagroups it is a member of are also visible. For each datagroup, a verifier can also see what (visible) fields are present in that datagroup. Thus, when a datagroup can be modified by a method, one must assume that all the fields that one knows are visible in that datagroup can be modified by that method, but one can also conclude that no fields that are visible that are not in the datagroup can be modified.

I agree that the dynamic frames technique is capable of the same kind of verification, and that one can achieve that by keeping a set of locations that includes all the locations in the datagroup up to date, and specifying that that location set is possibly modified for a method.

I suppose it's possible to argue that the dynamic frames technique is more general, in that it can solve other problems, aside from the extended state problem. (Are you arguing that, Mattias?) However, the dynamic frames technique comes at an additional cost: maintaining the location set must be done in ghost code. If we could define a syntactic desugaring that turns the datagroup declarations and uses in to the appropriate dynamic frames specifications and ghost code, then we could perhaps define that as a syntactic sugar in JML. Do you know of such a desugaring?

mattulbrich commented 2 years ago

Thanks, Gary @leavens, for pointing out the original publication.

This looks very much like a first step in the direction of dynamic frames. They are not yet as dynamic as one often needs, but they capture the same ideas. It seems (after quickscanning the paper) that recursive data structures are not possible yet. This would then probably introduce much of the dynamic aspect of dynamic frames.

However, the dynamic frames technique comes at an additional cost: maintaining the location set must be done in ghost code.

Let me say here that this needs not be the case at all! Here is a KeY example of a list like my above example, in which the footprint is a model field, not a ghost field:

//@ public model instance \locset footprint;

Then, instead of updating the ghost field manually, there is a represents clause in every implementing class like the following in ArrayList:

//@ private represents footprint = array, array[*], size;

(Take the syntax for location sets with a grain of salt and the private modifier, too). Having said this, I must admit that we usually dot not follow this approach since it is MUCH harder to prove. It seems that the ghost code that updates footprints helps the verification tools quite a lot in figuring out proofs.

I assume that DF and DG have mostly the same purpose. I wonder if we should have more than one framing mechanism in the specification language. Perhaps DG are a much neater way to specify frames. I assume that original DG specs can be desugared to DFs easily, but I have not spelled that out.

I am not yet 100% happy with the explicit nature of dynamic frame specifications. They should be more "under the hood". Perhaps DGs (or an extension of them) are this hood?

JavaModelingLanguage / RefMan

data groups, model fields and * #16