JavaModelingLanguage / RefMan

4 stars 0 forks source link

visibility for \invariant_for and \static_invariant_for #32

Open davidcok opened 2 years ago

davidcok commented 2 years ago

Splitting a discussion from #20.

Do both \invariant_for and \static_invariantfor collect the conjunction of all the invariants (static, respectively) or all the invariants visible at the point of the (static)invariant_for? I think I'm in favor of ignoring visibility here.

leavens commented 2 years ago

I don't think we should ignore the privacy of the invariants. If a client doesn't know about an invariant, then it can't be checked at the point where \invariant_for(o) or \static_invariant_for(T) is used. So I think it should be just what is visible. In RAC, from a client it might be impossible to access the fields mentioned in an invariant that is not visible.

davidcok commented 2 years ago

Here is a different viewpoint Gary. public data groups can hold private fields; that is part of the point of abstraction. Similarly, I think of \invariant_for as an abstraction of the 'validity' of an object, which may well include sub-invariants that are not directly visible.

I don't think RAC is a problem. The invariants, and \invariant_for, can be implemented as ghost methods that, while perhaps public themselves, can access any private methods and fields within the class.

As for dynamic vs. static. I have found JML's purely static spec rules for method specs to be overly limiting in practice, and I would expect the same to be true of invariants. Mattias promised an example, when he has time.

mattulbrich commented 2 years ago

The KeY take on this is as follows:

  1. Semantically, \inv_for(x) means ALL invariants for the dynamic type of x.
  2. However, when reasoning from outside the static class of x, we can only deduce that the public invariants of x hold. The private ones are hidden.

That is: Semantically, all of them, in reasoning we can only make use of the available ones.

mattulbrich commented 2 years ago

Mattias promised an example, when he has time.

Here we go:

class Data { 
  public int field;    // obvious constructor omitted.
  /*@ public invariant field != 0; */
  /*@ normal_behaviour 
    @   requires_redundantly \invariant_for(this); */
  public String toString() { return "" + (1/field); } }
}

class Client {
  static void m() {
     Data d = new Data(42);
     d.field = 0;       // (originally used wrong field here:  d.value = 0; )
     Object x = d;
     //@ assert \inv_for(x); // (*)
     x.toString(); // (+)
  }
}
  1. If (*) referred to the invariant of static type of x, ie. Object, then it would be true, i.e. satisfied, and the call in (+) would satisfy the precondition.
  2. If (*) referred to the dynamic type of x, ie. Data, then the precondition would be at least x.field !=0 (possibly more in subtypes of Data) and the call would not be allowed.

Since the call starting at Client.m actually crashes with division by zero, 1) is not sound.

leavens commented 2 years ago

Since in reasoning (statically) one can only make use of visible invariants, we should make that the semantics of \invariant_for and \static_invariant_for. RAC should use the same semantics (checking the invariants that are visible where it is used).

I don't see how a static verification tool can know the dynamic type of an object, and JML is designed so that one can always reason using static type information. By behavioral subtyping, the invariant for the static type of an object must be implied by the invariant for all of its subtypes, and hence must be implied by its dynamic type. So, \invariant_for(E) should mean the visible instance invariant for the static type of E. Similarly for \static_invariant_for(T).

The job of a programming methodology is to make sure that all the invariants for an object's dynamic type hold when a method is run on an object. This is what the Boogie methodology and the Universe type system of Peter Muller do. I agree that (without the Universe type system) JML has problems with that, but I doubt that the problem can be solved by the semantics of \invariant_for and \static_invariant_for, as some control of aliasing is needed to prohibit representation exposure and argument exposure.

davidcok commented 2 years ago

This example reminds me of similar situations I had when I was specifying Amazon's code. And it was precisely the same problem -- the validity of toString() depended on the validity of the object (in Mattias's example that field != 0) and I was looking for a way to know that information dynamically. With current strict JML rules x.toString() only reasons with the static type of x, which is Object, even though we know it is a Data. I found that \invariant_for, x.method-call and model field-represents clause combinations all needed to access dynamic type information in various situations. I made do with (dynamic) represents clauses but it was unwieldy.

mattulbrich commented 2 years ago

I do agree, Gary, that static verification can only make use of the static type. That is what we do indeed. We cannot make use of the possible details unknown at the call site. However, the problem is not about assumptions, it is about assertions: As in the above example it would suffice to prove true to show the invariant of the static type of x. But this information is not enough since the called method then assumes the full invariant for the type.

For this to go hand in glove, the same information must be provided at callsite that is used at callee site. They cannot disagree on the definition of what the invariant actually means.

I am sure that we agree, Gary, on the fact that the verification can only ever make use of the visible invariants. That is so indeed. But we cannot claim to establish an invariant by only establishing the static type bit. Thanks to dynamic dispatch on method calls, that won't do for a sound mechanism.

mattulbrich commented 2 years ago

As regards RAC: I agree with David that we can implement invariant as a Java method which is overridden in every type like

public boolean $invariant() {
   return super.$invariant() && localStuff;
}

thus using dynamic dispatch for having the good dynamic type available at all times.

Btw. Dafny implements invariants using a Valid() function usually which makes this quite transparent. (Wojtek and I actually have a paper on a generalisation of that in the Modularity journal)

leavens commented 2 years ago

The example, with classes Data and Client, has a public (non-final) field named "field". Was the field supposed to be named "value" instead? Or was the assignment to the valuefield of d, which seems to be a type error in the present example, supposed to be d.field = 0; ? If so, then the assignment d.field = 0; (let us say) should violate the invariant and should be caught at that point.

I agree it's not valid in the example to assume the invariant in the toStringmethod of Data. However, I would say that we should require that such public invariants about public (non-final) fields should be maintained by every class in which they are visible. This based on the methodology in Leino and Mueller's paper "Modular verification of static class invariants" in FM 2005 (see https://tinyurl.com/4xfc2989). (I can send the paper if you don't have access to it.) In their methodology, global invariants are to be maintained by every class in which they are visible; if we followed that, then Client would be responsible for maintaining the public invariant of Data if it assigned to the field of a data object, thus the program would be stopped at that point.

An alternative for this kind of example would be to explicitly write the instance invariant of Datain the precondition of Data's toString method; that is what Parkinson advocates in his paper "Class Invariants: The end of the road?" in IWACO 2007 (see {https://people.dsv.su.se/~tobias/iwaco/proceedings-complete.pdf). In that case we would have no need for \invariant_for, as we would need to check the precondition of toString. Of course there would still be a problem in this example as written, but the problem would be that the precondition of toString in Object would not imply the precondition of toString in Data, so it would not be a behavioral subtype of Object. So again the methodology would catch the problem.

Since we are not going to abandon invariants, it seems most sensible to adopt the Leino and Mueller solution. Another solution would be to disallow public non-final fields or disallow public invariants that depend on public non-final fields.

So, in short, I don't find the example convincing.

mattulbrich commented 2 years ago

Now that is a discussion that I have been longing for for a while. It won't be an easy one, but I think it will shape future JML a lot.

Gary rightfully speaks about programming methodologies. In the context of the framing problem there are at least three directions that have evolved and that we can turn our attention to for JML:

  1. Ownership. This had been implemented in JML originally. With rep and peer modifiers, one structures the heap into a hierarchical tree structure with only limited side-way connections. By setting restrictions on what can be referred to from, e.g., invariants, knowing about object invariants becomes really conveniently easy. + invariants hold very often and need only be considered when within the objects that actually can change them. - existing code can not really be expected to be retrofittable to this concept. Rustan and Peter's paper make use of the "rep" modifier.
  2. Separation Logic. Following pretty similar hierarchical structuring ideas, sep logic does not have invariants per se. They have abstraction predicates which are concept that is at least as powerful. I think it very much corresponds to model methods. + verification can work very neatly and veri fast (pun intended). - again, existing code has difficulties. - it is a pretty long way to go from where JML is now to sep logic.
  3. Dynamic Frames/Region Logic. The most liberal disciplines (I think differences between them are minor). Dependencies are managed explicitly as location sets (or similar). + no code is forbidden by methodology (like in the other approaches) - the management overhead may be indeed larger with explicit invariant handling.

KeY has been following the 3rd route for some ten years with a sound invariant methodology that also works in practice. A few case studies have shown that it is possible to verify JDK library code.

I am interested in combining 1 and 3, having ownership specifications and desugaring them into special dynamic frames with additional knowledge from ownership guarantees that combine the best of both worlds.

Now, which path shall it be? Choose wisely.

mattulbrich commented 2 years ago

If so, then the assignment d.field = 0; (let us say) should violate the invariant and should be caught at that point.

Thanks for pointing that out, Gary @leavens. KeY does not check invariants at assignments, because it would not know which invariants it has to check. How does OpenJML deal with this here, David @davidcok?

Let us for example have the following code that breaks the invariant of cc. It has no public fields. It is also pretty reasonable, I'd say. Should JML reject the code? How can the assignment in SomewhereElse know about the context? It might not even know class Client, let alone be able to check its invariant.


class Cell {
   private int field;
   publiv int getField() { return field; }
   public void setField(int x) { field=x; }
}

class PositiveContaier {
   pivate Cell c;
   //@ invariant  c.getField() > 0;

   //@ requires c.getField() > 0;
  CellContainer(Cell d) { c = d; }
}

class Client {
   static void main() {
      Cell c = new Cell();
      c.setField(1);
      CellContainer cc = new CellContainer(c);
      SomeWhereElse.workOn(c);
      //@ assert \invariant_for(cc);    // does not hold.
}

class SomeWhereElse { static void workOn(Cell c) { c.setField(-1); } }
leavens commented 2 years ago

This example has representation exposure, since the formal d in the constructor for PositiveContainer makes an alias between the argument and the cell c that is in the representation of PositiveContainer. The Universe type system would disallow this if c was declared to be part of the representation of PositiveContainer. Since representation exposure is a known cause of problems with invariants, we could reasonably disallow the constructor for PositiveContainer, since it is a direct cause of representation exposure. Wouldn't preventing representation and argument exposure be useful with region logic and dynamic frames?

P.S., the method getField in Cell should be declared to be pure, and the modifier for the declaration of the field c in PositiveContainer should be private. And the constructor in PositiveContainer should be named PostiveContainer. The variable cc in Client should have type PositiveContainer (or I suppose all occurrences of PositiveContainer should be renamed CellContainer in the entire example). All these can be easily fixed, however.

mattulbrich commented 2 years ago

Thanks for pointing out the errors in the program. However, you grasped its idea. It is the exposure of the field which should actually be a rep field if we require adherence to ownership.

I haven't ever seen the general universe types (GUT) in action unfortunately. I understand there is not even a type checker implemented currently.

I wonder if Universe Types can really be imposed on existing Java code.

mattulbrich commented 2 years ago

On 17.12.21 18:37, David Cok wrote:

This example reminds me of similar situations I had when I was specifying Amazon's code. [...] I made do with (dynamic) represents clauses but it was unwieldy.

In KeY, the invariant is a special model field such that the invariant clauses implement the represents clauses (with some specialties regarding visibility and covariance). Thus o.<inv> (KeY's JavaDL syntax) refers to the same function java.lang.Object::<inv>(Heap, Object) whose definition depends on the (dynamic) type of the argument. If the dynamic type is unknown, then can still make use of the static type.

wadoon commented 2 years ago

Note, that the syntax of "implicit identifier", like T.<inv> or obj.<created>, will be changed in future when we switch to Java 5+ to avoid ambiguity with type parameters.

mattulbrich commented 2 years ago

Note, that the syntax of "implicit identifier", like T.<inv> or obj.<created>, will be changed in future when we switch to Java 5+ to avoid ambiguity with type parameters.

To clarify: These notations are not JML, but JavaDL, KeY's logic.

leavens commented 2 years ago

The Checker Framework makes it possible to implement a universe type system checker, see https://checkerframework.org/releases/1.4.0-b1/current/checkers-manual.html#gut-checker for how that is done, although I don't know if such a tool comes pre-defined with the checker framework. I recall that it is always possible to use Universe types with pre-existing programs, simply by putting all objects into the same universe. Of course if you develop your program by delimiting a number of different universes, you can get more guarantees from it. On the other hand, if I remember correctly, the universe type system would not point out rep exposure problems unless you declare some fields of an object to be rep fields. So perhaps that doesn't help with such an example as Mattias's.

I agree that it's good to be able to express what it means for a program to be correct, and for all its specifications to hold, so it makes sense to use KeY's approach to be able to express whether all invariants hold at some program point, and I agree about RAC. I see that KeY's static verification takes the visibility of invariants into account.

Overall, I believe that we want different kinds of visibility for specifications, especially for invariants, so that some design decisions can be hidden from some clients, but made visible to other code. For example, if we are implementing sets, we might want to hide the design decision that the array used to represent the elements of a set has no duplicates from clients, so that invariant would not be public (and the field would not be public). It might be sensible for that array field to have protected visibility, in which case having the invariant be protected would make sense. On the other hand, I can see that better defaults might make dealing with such privacy levels for specifications easier to work with. Any suggestions on that, David?

Nevertheless, I would still like an invariant methodology that guarantees that the expected invariants hold when they are needed, and it seems that some problems, like representation exposure, are common enough to check for in tools and warn users about. But rep exposure seems to require some sort of declaration of what the rep of a type is. (Although, maybe we could have some default, say any field used in an invariant?) Furthermore, perhaps the state of the art of programming methodology is not up to the task of having a useful and sound invariant methodology that fits with what we want to use JML for. I could believe that the dynamic frames technique might be able to express and enforce any methodology with certain (desirable) properties; has that been shown in a publication?

mattulbrich commented 2 years ago

I think marrying ownership types with dynamic frames beneficially and seamlessly would be really nice and valuable!

I could believe that the dynamic frames technique might be able to express and enforce any methodology with certain (desirable) properties; has that been shown in a publication?

Isn't there a publication on region logic and separation logic, Gary? That could answer that last question to some degree.

leavens commented 2 years ago

Mattias, yes my former student Yuyan Bao's work allows translating (a supported subset of) separation logic into (fine-grained) region logic. See the paper:

Yuyan Bao and Gary T. Leavens and Gidon Ernst, "Unifying separation logic and region logic to allow interoperability" Formal Aspects of Computing, Vol 30, number 5, pp. 381-441, 2018 https://doi.org/10.1007/s00165-018-0455-5

(I can provide a copy if you don't have access to that.)

mattulbrich commented 2 years ago

Mattias, yes my former student Yuyan Bao's work allows translating (a supported subset of) separation logic into (fine-grained) region logic.

That is the work I was referring to. I am aware of it -- though I do not have all details present.

I think something similar should be done with universe types and then the two can coexist -- with benefits hopefully ...

leavens commented 2 years ago

Yes, I agree that would be nice…