syntax and operations for \seq \set \map

davidcok commented 2 years ago

I've conveyed my implementations of built-in seq set map to use the names we have agreed on, with backslashes.

To complete that work, I'd like agreement on the methods and operations that should be standardized, including

how closely to adhere to java syntax
what operators to include
whether to have methods that duplicate operators
whether to define the runtime version of these types and what it should be (lower vs uppercase (seq ve Seq and which package: org.jmlspecs.lang or org.jmlspecs.runtime)

I've written draft sections in the evolving LaTex document, so it would be easiest to comment in line there -- essentially anything in a \jmlxtodo{} comment is asking for your opinion.

So please read and comment on the opening of Chapter 5 and sections 5.9, 5.10, 5.12

mattulbrich commented 2 years ago

wow! Great! Regarding the classnames, I do not really care since they are not really visible from the specification anyway. I'd also favour .runtime instead of .spec

mattulbrich commented 2 years ago

I have one very basic point of discussion: The JML builtin types are supposed to be primitive types and yet many properties are like for reference types with the possibility to diverse, have side effects, "new" values, etc. being different from null, etc. I think we should not mention that since it actually gives a wrong impression. Unless, that is, JML types are not primitive types. But I'd really prefer them to be primitive. So that would not be set.empty but rather \set() and \set(1,2,3). etc.

wadoon commented 2 years ago

I would love to have a common ground (syntax and semantics) for such built-in types. A common ground which would allow a user to add logical datatypes or ADTs by themselves in JML, and also helps JML-tool implementors to handle data types in a similar to Java types.

For example, for \set<T> I can imagine the following:

public heap_free model class \set<T> {
  ...  axioms   ... 
  ...  invariants  ...

  public static <T> \set<T> empty();
  public static <T> \set<T> of(T... values);

  public \set<T> \opEquality(\set<T> other);

  /*@ ensures \result <==> !(this == other); */
  public \set<T> \opInEquality(\set<T> other);
  public \set<T> \opPlus(\set<T> other);
  ...
  public boolean \opArrayAccess(T elem);

  public boolean has(T elem);

  public static <T> \set<T> disjoint(\set<T> ... args);
}

We can enrich the class further by using method contracts, implementations, axioms and invariants.

New are:

Operator names
heap_free on class

mattulbrich commented 2 years ago

I think that either we have the "class" analogon. Then these things should look like Java classes, or we appeal to the analogon of primitive types ithen it should not ressemble Java classes so much.

I vividly remember a discussion at a Leiden workshop on JML where we discussed whether ADT definitions should make be close to class definitions such that users are on terra firma or whether they should be deliberately different to make their nature as non-reference-typeS more explicit. I also distinctly remember that we actually thought that non-class syntax would be better suited. Peter Schmitt and Gary favoured that a lot at the time.

leavens commented 2 years ago

It might make more sense if the JML type names began with an upper-case character, like \Seq and \Set, so that the translation to Java (for RAC) would be more direct. However, I realize that this would make a conflict with what is already done in KeY. However, it might still be sensible to use Java types Seq and Set in RAC for runtime assertion checking.

@wadoon your suggestion conflicts with @mattulbrich's suggestion of not using Java notation for such value types (at least for empty). I don't see how it makes sense to define an array access notation without defining an order, which doesn't make sense to me for sets (although it makes sense for sequences). If sets and sequences are pure value types, then I do still think it makes sense to distinguish them notationally from Java methods. The RAC compiler can translate them into something more ugly (maybe using lots of underscores?).

I also don't think it's a good idea to have provisions for user-defined ADTs of values that can be added to JML. This is a whole can of worms that was present in the Larch-style languages and JML's design is (explicitly) against that. On the other hand, it's fine to have a convention so that implementations can experiment with new value types. So I advocate a compromise position where we have a convention for value types but not a way for users to define and add their own.

I'll work on reading the draft...

davidcok commented 2 years ago

I want them to appear like primitive types for the most part.

using lower-case names (like int, float, \bigint) is one aspect
using Java like syntax, e.g. s.size(), s.equals(ss) or set.equals(s,ss) means that tools can re-use Java parsing
the places where I'm in favor of non-Java syntax is to allow some operators, especially == + and []
where it is debatable is for value constructors: seq<T>(a,b,c) vs. seq.<T>of(a,b,c) and similarly for empty (we could leave off empty and just use seq.<T>of() but that is not so intuitive).
and then for methods, it just makes sense to me to leverage off Java and use s.size(), etc. rather than defining a whole raft of -keywords like \size(s), \disjoint(s,ss) etc. It certainly simplifies implementation (at least within OpenJML).

I think @wadoon misunderstood the operator[]. For sets s[o] is true or false depending on whether o is in the set or not. o is not an integer index. No ordering is needed. Essentially \seq, \set, \map are all maps: from \bigint->T or T->boolean or T->V respectively, and the [] is just the corresponding lookup.

I'm happy to use conventional Java uppercase Seq and Set for the RAC-translations . My main questions are first, whether the RAC name should be defined in the language (or left up to tools) and second what the RAC name should be (including what package it should be in). I don't have much of an opinion on the actual name.

wadoon commented 2 years ago

@leavens @davidcok I understood s[o] as David wrote, hence the return types are boolean and the parameter of type T (as written in the current draft):

  boolean \opArrayAccess(T elem);
  boolean has(T elem);

I see it from an implementors perspective: I do not want special cases. In this case, a fixed set of built-in datatypes. Instead, I would like to have simple notions, here, for the definition of logical data types, on which I could bootstrap built-in types. And of course, I open this notion for the user as well. So, for the implementation I definitely need to go along with a syntactical notion of logical data types, the only difference is whether we can find a common ground on the syntax/semantics or I implement my own JML extension.

@mattulbrich, I was not there, but currently I see it as follows:

We use Java syntax (in this (static) method invocations) on the user side. Why not use a similar Java-like syntax on the definition side? We should reuse our syntactical and semantic notions if they fit.
My student finished its CASL-KeY integration, and the useful entities are constructors, functions, and axioms from CASL for KeY. We have syntactic rules for this in Java and JML. Moreover, I would say that JML could be more user-friendly than CASL by using contracts.
From the implementors perspective: Implementing JML is already a difficult job. At some point, JML should be self-contained: New definitions (data types, extension functions to existing types) should not require to adapt the reference or tools. Instead they should just be handled by adding new files.

leavens commented 2 years ago

I'm okay with \seq and \set for the reasons @davidcok mentioned (looking more like primitive types, similar to \bigint). I like using Java-like syntax for operations, like s.size() etc. because that is similar to JML's original design (and it sounds like it would be easier in the implementation also, which is good), and I'm okay with using == and +, and other standard notations for infix operators where it makes sense and Java already uses such operators.

However, I'm not so keen on using [] to test membership, especially for sets. I would prefer a syntax like s.has(e) for that, as it is more Java-like and (I think) intuitive. I understand there is some reasoning behind it, so I'm not strongly opposed to it, but I think a different syntax for membership or mapping would be better.

For RAC, the translator (compiler) and the runtime system have to agree on names for classes used to represent sets and sequences etc., but the user doesn't need to know those names, although it might help portability if they were fixed by the reference manual. (That is, one could compile for RAC using one platform, and run elsewhere using a compatible runtime.)

@wadoon I can see some benefit to making it easier for tool developers to add their own new built-in types for specification purposes as this would make research along those lines easier, although the disadvantage of that is that it would encourage incompatible dialects of JML. Following the original design of JML would lead us to not making provisions for such extensions in the Reference Manual, although that would not necessarily stop tool developers from reaching some agreement... CASL-KeY seems like a step in that direction (i.e., towards something that would be more like Larch/Java), although that may not be a bad thing.

I agree that it would be nice if extending JML were simply a matter of adding new files, but I think that achieving that goal would require abstraction mechanisms that are not currently part of JML. It would be an interesting research project for the future. On the other hand, if you take the history of extensible programming languages as a guide to what would be the result, the history is not encouraging for such a project.

JavaModelingLanguage / RefMan

syntax and operations for \seq \set \map #57