feature: EQUALS statement

stefjoosten commented 5 years ago

Issue #850 introduces a new statement: the EQUALS statement. This equates two named objects, so it is in fact a specific form of rule.

To be more precise, let me propose the following as the operational semantics of the EQUALS statement. I am expressing it in the context of FormalAmpersand (i.e. Ampersand's own metamodel).

RELATION equals[Relation*Relation]
RULE "Equate relations" : equals |- I
VIOLATION (TXT "{EX} MrgAtoms;Relation;", SRC I, TXT ";Relation;", TGT I )

Any violation that is triggered by this rule should prevent the further interpretation (such as code generation) of the namespace in which the violation has occured.

As a consequence, two relations that are made equal may have multiple signatures (i.e. name-source-target). A pair that exists in either relation now lives in the resulting relation, yielding the union of all pairs from both sides.

(We will not implement this until issue #850 is well on its way. So we intend to implement the EQUALS statement as a feature branch within feature 'Multi Context'.)

sjcjoosten commented 5 years ago

There are several things to think about:

Some syntax for EQUALS.
A good merge operation for every kind of named object.
Some details on when merges are allowed, and when they are not.
A good data-structure for references to named objects. I have started on a partial proposal below.
Whether or not to demand isolated namespaces to be compilable on their own.

For concepts, EQUALS is already in place:

CLASSIFY X IS Y

what happens here, is that X or Y will be chosen as the canonical concept name, and writing either becomes completely equivalent. As a data structure for concepts, the entire set of equivalent concepts is typically used in the type system itself.

I have not seen a syntax proposal yet; here is one that would be problematic:

EQUAL r IS s

The problem here is that if r and s are both relations, interfaces, rules and concepts, we have an ambiguous statement: which do we mean to be the same? So lets suppose that we have some way of knowing the kind of r and s. I'll add some syntax to the examples from this point on. Here kind is context, relation, concept, rule, service and maybe something else (per number 7 in issue #850 .

Another question is how to merge two things. In the context of Hawaii:

RULE guardPrerequisites1 : required;attends~ |- pass
VIOLATION (TXT "Course ", SRC I, TXT " must be passed by Student ", TGT I)

RULE guardPrerequisites2 : attends~ |- required\pass
VIOLATION (TXT "Student ", TGT I, TXT " cannot go to ", SRC I, TXT " without passing ", SRC required~)

EQUAL Rule guardPrerequisites1 IS Rule guardPrerequisites2

I can imagine that merging these rules would not be allowed because they have different types. The point here is that for every kind we need a good definition of what the merge operation does.

For relations it seems clear:

EQUALS Relation r IS Relation s

I will want to check that the types of r and s are indeed equal. This brings us to the checks that we may want to perform before we merge. For instance, some things of equal type cannot be declared equal:

EQUALS Concept BLOB IS Concept Integer

the problem here is that BLOB and Integer are built-in concepts, and they are to be mapped to different things in SQL. Similarly, some atoms can be declared equal, but the Integer 3 and the integer 1 cannot.

After working out what needs to be checked, and what merge operations we want, the main complication will be giving good feedback. Consider writing:

RELATION r[ONE*A]
RELATION s[ONE*B]
RELATION t[ONE*C]
EQUALS Concept A IS Concept C
EQUALS Concept B IS Concept C
EQUALS Relation r IS Relation s
EQUALS Relation r IS Relation t
EQUALS Rule x IS Rule y
Rule x : r = s~
Rule y : t = t~

A bad error message could read:

Cannot match C to ONE in Rule y : t = t~

Similarly, in the generated SQL, interfaces and documents, we want to use the preferred names over the canonical names where possible. To achieve this, the Haskell data-structure for a reference to a named object should carry:

a non-empty set of possible fully qualified identifiers. These do not have to be showable.
a non-empty set of preferred names. These are used for communicating with the user. Comparing whether two names objects are equivalent (i.e. equal up to the EQUALS relation) happens through a function that compares the set of possible names, ignoring the preferred name. If there is overlap in the set of possible names, but the sets are different, a fatal should occur: after resolving the EQUALS relation, and figuring out which named object a reference points to, these sets should uniquely identify the object. Whenever two references are combined, the set of preferred names should remain as small as possible, to avoid giving `too much' to the user.

Note that it might be possible for the preferred names to be equal, while the references are not:

RELATION r[A*B]
RELATION r[B*A]
RELATION s[B*A]
EQUALS Relation r IS Relation s
RULE x: r[A*B];r~ = r~;r

The preferred names in the rule above are r[A*B], r, r and r, respectively. The fully qualified sets are {r[A*B]}, {r[A*B]}, {r[B*A],s[B*A]} and {r[B*A],s[B*A]}, respectively (we may use a rich data-structures that includes the line of the declaration etc). As we allow for the combination of different scripts, these possibilities grow.

A final thing to think about is whether namespaces should be checkable in an incremental way: do we wish to allow a script that has a rule with y in it, while y is not in scope: that is, we allow for a script that cannot be checked individually, and can only be used inside a bigger context. There are a couple of advantages of doing this:

we can (partially) compile scripts before merging them, solving certain potential scalability issues
a (compilable) model is guaranteed to have sufficient information in the file to be understandable
we can give errors early: both early in the design process as well as early in the compiler. Giving them early in the design process makes it easier for modelers to run their (partial) design. Giving them early in the compiler improves the quality of the error messages. As an example, suppose a file has a superfluous EQUALS somewhere that causes a type error by identifying BLOB with Integer: if we implement incremental checking, we would be able to point to the file in which the EQUALS is clearly superfluous. If we don't, we would need to point to a whole chain of EQUALS statements spread throughout multiple files (we could still point to the entire chain when checking is incremental, but we cannot point to a single file if checking is not incremental).

stefjoosten commented 5 years ago

What we discussed is to use the EQUALS statement only for concepts and relations. We could not come up with a proper use case for equating rules, yet a cartload of problems emerge if you do. I should have documented this restriction earlier.

If we decide to use CLASSIFY for equating concepts, we can restrict the use of EQUALS to relations. That way we don't have to complicate the syntax with the kind RELATION.

stefjoosten commented 5 years ago

Are there any syntactical problems for letting the following two code fragments be equivalent?

CLASSIFY A IS B
CLASSIFY D IS E
CLASSIFY P IS Q

and

CLASSIFY A IS B, D IS E, P IS Q

?

stefjoosten commented 5 years ago

Now suppose we have the following code fragment in a namespace:

RELATION r[A*B]
RELATION s[P*Q]
EQUALS r=s

The rule "Equate relations" invites to allow s[A*B], or r[P*B]. So the type checker will enforce (by means of type errors) that A=P and B=Q.

stefjoosten commented 5 years ago

Ampersand 3 (the current situation) has the following metamodel:

RELATION name[Relation*RelationName] [UNI,TOT,SUR] -- I must look up why this is surjective
RELATION sign[Relation*Signature] [UNI,TOT] 
RELATION src[Signature*Concept] [UNI,TOT]
RELATION tgt[Signature*Concept] [UNI,TOT]

I would expect the following in the syntactical structure:

RELATION signature[Relation*Signature] [TOT]
RELATION name[Signature*Identifier] [UNI,TOT]
RELATION type[Signature*Type] [UNI,TOT]
RELATION src[Signature*Concept] [UNI,TOT]
RELATION tgt[Signature*Concept] [UNI,TOT]

Somehow the latter feels better. However, I cannot find a compelling reason yet why we would have to change the metamodel... (Anyone?)

hanjoosten commented 5 years ago

Are there any syntactical problems for letting the following two code fragments be equivalent?
CLASSIFY A IS B
CLASSIFY D IS E
CLASSIFY P IS Q
and
CLASSIFY A IS B, D IS E, P IS Q
?

Here are my pro's and cons: Pros:

??

Cons:

If a user makes an error, the Origin of the statement where the error is found is given. In the case of a single CLASSIFY statement this is less meaningful than if there are multiple statements.
Parse errors will become less clear.
A modeller has more ways to write the same things. This clutters the learning curve.
There is simply more code, so this has a negative impact on the maintainability of the code base.

RieksJ commented 5 years ago

On 23 Nov 2018, @sjcjoosten wrote

For concepts, EQUALS is already in place:
CLASSIFY X IS Y
what happens here, is that X or Y will be chosen as the canonical concept name, and writing either becomes completely equivalent. As a data structure for concepts, the entire set of equivalent concepts is typically used in the type system itself.

In order to properly integrate the new prototype framework and SIAM, we need real equivalence between the concepts PF_Role (from the new prototype framework) and Role (as defined and used extensively in SIAM and applications that use SIAM). According to what @sjcjoosten has written above, statement CLASSIFY Role IS PF_Role would make PF_Role and Role <QUOTE>completely equivalent</QUOTE>.

However, this is not the case. When inspecting the database we see PF_Roles that are not Roles, with corresponding failing behaviour in prototype functionality.

@sjcjoosten: can you fix this?

RieksJ commented 5 years ago

Related to the above comment is that when the SIAM module contains CLASSIFY Role IS PF_Role and when the SystemContext.adl file uses CLASSIFY PF_Role IS Role, then an error is generated saying there is a cycle. This is obviously not the case because both concepts are EQUALS.

RieksJ commented 4 years ago

Regarding syntax for EQUALS, I suggest to do this in a similar way as PURPOSE. Stating the equivalence of two relations r and s would then be phrased as

EQUALS RELATION r IS s

from which it is obvious that both r and s are relations.

However, a nice (and useful) extension would then be to allow s to be an expression, which would mean that the following syntax would be valid:

EQUALS RELATION r IS expression

Implementation is rather straightforward, as this (in case r has signature A * B) would be syntactical sugar for:

ROLE ExecEngine MAINTAINS "EQUALS - InsPair r[A*B] IS expression"
RULE "EQUALS - InsPair r[A*B] IS expression": expression |- r[A*B]
VIOLATION (TXT "{EX} InsPair;r;A;", SRC I[A], TXT ";B;", TGT I[B])
ROLE ExecEngine MAINTAINS "EQUALS - DelPair r[A*B] IS expression"
RULE "EQUALS - DelPair r[A*B] IS expression": r[A*B] |- expression
VIOLATION (TXT "{EX} DelPair;r;A;", SRC I[A], TXT ";B;", TGT I[B])

and the associated code would then need to be generated.

AmpersandTarski / Ampersand

feature: EQUALS statement #851