stefjoosten commented 2 years ago

This issue was first spotted in Ampersand-v4.6.2. I have reproduced the symptoms in RAP for ease of reproduction. This RAP compiles with Ampersand-v4.3.0, so the behaviour is present in that version too.

What happened?

When making a prototype of a script, I installed the application with the following result:

This is the script:

CONTEXT IssueProtFramework95

   RELATION personalia[Notification*Person] 
    = [( "1" , "person1" ), ( "3" , "person3" )]
   RELATION email[Notification*EmailAdress]
    = [( "1" , "stef.joosten@ou.nl" ), ( "3" , "stef.joosten@ou.nl" )]

   RELATION email[Person*EmailAdress] [INJ]

   ROLE ExecEngine MAINTAINS MakePersonEmail
   RULE MakePersonEmail: personalia~;I[Notification];email |- email
   VIOLATION ( TXT "{EX} InsPair;email;Person;",SRC I, TXT ";EmailAdress;", TGT I )

ENDCONTEXT

What I expected

I expected to get an error message about email[Person*EmailAdress], since that relation is injective and the ExecEngine tries to attach two persons to the same EmailAdress (given this particular population).

If I remove the [INJ] directive from the source code and declare injectivity as a rule of its own (RULE InjEmail : email;email~ |- I[Person]), I get the expected behaviour:

stefjoosten commented 2 years ago

Analysis

This issue requires a log file. The following log contains a summary of the rerun steps nr. 4 and 5, to demonstrate that we have an infinite loop. Rerun step nr. 6 is represented in full for the sake of completeness:

prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: ExecEngine run #258193237-4 (auto rerun: true) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Checking rule 'MakePersonEmail' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Conjunct 'conj_8' broken: 1 violations [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: InsPair(email,Person,person1,EmailAdress,stef.joosten@ou.nl) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] DATABASE.DEBUG: UPDATE "EmailAdress" SET "email" = 'person1' WHERE "EmailAdress" = 'stef.joosten@ou.nl' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.DEBUG: Exec engine run finished [] {"request_id":"d2dc248f12"}

prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: ExecEngine run #258193237-5 (auto rerun: true) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Checking rule 'MakePersonEmail' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Conjunct 'conj_8' broken: 1 violations [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: InsPair(email,Person,person3,EmailAdress,stef.joosten@ou.nl) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] DATABASE.DEBUG: UPDATE "EmailAdress" SET "email" = 'person3' WHERE "EmailAdress" = 'stef.joosten@ou.nl' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.DEBUG: Exec engine run finished [] {"request_id":"d2dc248f12"}

prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: ExecEngine run #258193237-6 (auto rerun: true) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Checking rule 'MakePersonEmail' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Evaluating conjunct 'conj_8' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] DATABASE.DEBUG: select distinct t1.src as src, t1.tgt as tgt from (select fence0.src as src, fence1.tgt as tgt from (/* EFlp (EDcD personalia[Notification*Person]) */ /* Expression: personalia [Notification*Person]~ */ /* Signature : [Person*Notification] */ /* case: EFlp x */ /* Expression: personalia [Notification*Person]~ */ /* Signature : [Person*Notification] */ /* EDcD personalia[Notification*Person] */ /* Expression: personalia [Notification*Person] */ /* Signature : [Notification*Person] */ /* Expression: personalia [Notification*Person]~ */ /* Signature : [Person*Notification] */ select "Person" as src, "Notification" as tgt from "personalia" where ("Notification" is not null) and ("Person" is not null)) as fence0, (/* EDcD email[Notification*EmailAdress] */ /* Expression: email [Notification*EmailAdress] */ /* Signature : [Notification*EmailAdress] */ select "Notification" as src, "EmailAdress" as tgt from "email" where ("Notification" is not null) and ("EmailAdress" is not null)) as fence1 where (fence0.tgt = fence1.src)) as t1 left join "EmailAdress" as t2 on (t1.src = t2."email") and (t1.tgt = t2."EmailAdress") where (t2."email" is null) or (t2."EmailAdress" is null) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] RULEENGINE.DEBUG: Conjunct 'conj_8' broken: 1 violations [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: ExecEngine fixing 1 violations for rule 'MakePersonEmail' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: Fixing violation 1/1: ((person1[Person],stef.joosten@ou.nl[EmailAdress])) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.INFO: InsPair(email,Person,person1,EmailAdress,stef.joosten@ou.nl) [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] CORE.DEBUG: Add link (person1[Person],stef.joosten@ou.nl[EmailAdress]) email[Person*EmailAdress] to plug [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] CORE.DEBUG: Atom person1[Person] already exists in concept [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] CORE.DEBUG: Atom stef.joosten@ou.nl[EmailAdress] already exists in concept [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] DATABASE.DEBUG: UPDATE "EmailAdress" SET "email" = 'person1' WHERE "EmailAdress" = 'stef.joosten@ou.nl' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] CORE.INFO: Link added to relation: (person1[Person],stef.joosten@ou.nl[EmailAdress]) email[Person*EmailAdress] [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.DEBUG: Added (person1[Person],stef.joosten@ou.nl[EmailAdress]) email[Person*EmailAdress] [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.NOTICE: ExecEngine fixed 1 violations for rule 'MakePersonEmail' [] {"request_id":"d2dc248f12"}
prototype  | [2022-05-10 15:57:04] EXECENGINE.DEBUG: Exec engine run finished [] {"request_id":"d2dc248f12"}

Story

So what is going on? The situation is that a given population in Notification is being copied to a person's record. Since an e-mail address identifies a single person, the RELATION email[Person*EmailAdress] is declared to be injective. The insertion is implemented by an UPDATE because injective relations are implemented as attributes in the database. But the update does an implicit delete before adding the new value, so the injectivity constraint is never violated. In fact, it cannot be violated because it is implemented as an attribute.

stefjoosten commented 2 years ago

Diagnosis

I think we can solve this issue. Every update on an SQL attribute should first check whether the old value is either zero or equal to the new value. If not, it fails with a violation of injectivity or univalence, to produce the desired behavior shown in the first comment. This involves work in the prototype framework.

stefjoosten commented 2 years ago

Issue AmpersandTarski/Ampersand#1283 may be the same issue as this one. I articulated that issue much more superficially than this one.

Michiel-s commented 2 years ago

Hi @stefjoosten, I understand you confusion here. But keep in mind, within a single transaction, rules need to be able to be violated. Because, how else could we do updates to say relations that are defined UNI,TOT.

The exec-engine runs just before checking invariant rules. So this is within the boundaries within a transaction.

I think we can make it work like you state in your diagnosis; checking the existing value first. I do see 2 issues here:

Adds additional query (thus database interaction) for every write. Time penalty might be an issue.
Changes the current behavior in a back-wards incompatible way. To be honest, I cannot oversee the implications for current scripts and prototypes in the wild. Many exec-engine rules just do an Inspair, expecting them to succeed even if there is already a value present; just overwrite.

stefjoosten commented 2 years ago

@Michiel-s, Like yourself, I also expect that some existing scripts might not work anymore. @sjcjoosten is also afraid this might happen. Nevertheless, this may not be a reason to let a mistake persist in the code. It is a bug, so we should fix it and face whatever consequence this might have...

However, there might be a reason why this inconvenience passes our doorway. The behaviour of an infinite loop represents an error just as a violation of an injective restriction does. So if the fix affects erroneous behaviour only, the correct behaviour remains unaffected. I am not sure whether the suggested solution discriminates properly between the two. Maybe @sjcjoosten has something useful to say about this... If so, we might want to refine the solution to affect erroneous behaviour.

By the way, the fact that this bug affects erroneous behaviour only makes resolving this bug no less relevant, because tracing the cause of an infinite loop is extremely laborious in Ampersand.

Michiel-s commented 2 years ago

So far I'm not convinced it's a bug. Call it a feature.

Ampersand defines the rules that govern the transaction. As I explained that is guaranteed.

The question is what we consider allowed within the boundaries of a transaction. I consider all rules the same; whether they are multiplicity constraints or any other other rule. Only when trying to commit a transaction we look if the new state complies with the rules. I don't see why you shouldn't be able to violate a multiplicity constraint within a transaction.

But there is a trade-off; as we know, for performance reasons, it is way better to put all UNI and INJ relations in a concept table. So we technically don't have the option to violate UNI or INJ constraints in the prototypes.

The problem you describe is the combination of a) design choice to let a Inspair on UNI or INJ relation overwrite a existing value with b) a user specific exec-engine rule that doesn't take into account this behavior. Why change (a) if we can fix/signal better situation (b).

sjcjoosten commented 2 years ago

I am convinced by Michiel that this can be a feature: when a relation is UNI, in most cases the UNI is either a modeling error, or it is desired to delete before the insert. I am also convinced that there is something that needs fixing in the Ampersand prototype-generator. I'll start by describing what I believe is wrong in the current situation and why it is important to fix it, and then propose one way to fix it while maintaining the feature.

There is a fundamental problem with translating an insert as an update, namely that the Ampersand-programmer didn't ask for a delete, but a delete still occurs. I'll admit that the &-programmer writes 'UNI' There is a problem with translating an insert as an update here, because it may be hard for the Ampersand-programmer to foresee whether a delete will occur: This depends on whether the &-programmer writes 'UNI', but this is expressed in a different place and possibly with a different purpose. The affected exec-engine-rule does not show any signs of the delete happening, so this would be a case of spooky-action-at-a-distance (saaad), which I would like to avoid. Indeed Ampersand has only very little saaad, one example being that UNI changes the defaults for CRUD specs on the interfaces. However, this I don't mind as much: it can easily be overwritten, and I sometimes see CRUD specs expressed that match those defaults exactly. I take that to be a sign that the &-programmer wishes to avoid saaad, and I believe it to be good practice for that reason. I can imagine a code-hinter that would point out where CRUD specs have not been specified, and offer to fill in the CRUD specs that are default at the moment.

Now how to avoid saaad without sacrificing the use of a very efficient Update in our prototypes? One idea that doesn't work is requiring the &-programmer to write a delete-before-update: that could trigger TOT violations. An idea that would work is to introduce a specific UPDATE rule for the exec engine, separating currently the two different behaviors of INSERT.

This would change the situation as follows:

Current Ampersand scripts would need to change the INSERT into an UPDATE wherever this translation used to be automatic due to an UNI
the INSERT would work as described above which is inefficient on tables that have an UNI constraint: whenever an INSERT occurs, the exec-engine checks that the corresponding update will not carry out a DELETE. If it does, the exec-engine queues a violation of the UNI constraint to be raised.
the UPDATE could replace the current INSERT functionality on UNI or INJ relations. We might want to raise an error in scripts where an UPDATE is used for a relation that is neither. Alternatively, this would work as a DELETE-then-INSERT on tables that are not UNI/INJ.

RieksJ commented 2 years ago

As an old-school &-programmer, I object to the statement "There is a fundamental problem with translating an insert as an update, namely that the Ampersand-programmer didn't ask for a delete, but a delete still occurs."

My first objection is that it is illogical. Not asking for a delete that nevertheless occurs is in my opinion insufficient evidence for having "a fundamental problem with translating an insert as an update." And what to think of the fact that programmers also do not ask for a create, but nevertheless, the (php)code of InsPair has the statements $src = new Atom($a, $srcConcept); (line 61 of file execEngineFunctions.php) and $tgt = new Atom($b, $tgtConcept); (line 63), and they are often executed. And what about this idiotic _NEW atom, that wasn't part of the population when the rule was evaluated? Or the _AND keywords that are allowed, enabling a single instruction to do multiple creates, inserts and deletes? If I were to follow this way of reasoning, this is not only spooky but it seems that the suggested conclusion hardly scratches the surface.

But I don't follow this way of reasoning. I like to build on what we've done up till now. So leg me jog our memories

Long, long ago, in the beginning of this century, @sjcjoosten brilliantly proposed to distinguish between invariants and process rules by adding the ROLE syntax, and we were very excited. As we experimented, we soon got the idea that VIOLATIONS were not to be used as error messages, but as instructions to the actor that played the role. The actor would see what was wrong and how to fix it. I recall a nice demo of this (of the VOG) at Justid.

This idea was expanded to electronic actors. VIOLATION statements became real automatable working instructions for an electronic actor, called the ExecEngine. We coded a set of basic functions to form the language for such working instructions, and extended that as needed. How cool! The ExecEngine could send emails and SMS messages, do complex computations such as Warshall, collect postcodes from a remote API, etc.

From this perspective, I can raise my second objection, which is that I find it silly (rather than spooky) to assume there is a 1-1 correspondence between a database instruction (or a relational algebra operator for that matter) and existing ExecEngine functions. It's simply not what the ExecEngine is all about.

The solution is straightforward: we could add a set of ExecEngine functions that behave as the theory likes them to be. It seems like the practical thing to do. As a side-effect, it provides a means to illustrate the difference between theory and practice, not only in theory, but in practice as well.

sjcjoosten commented 2 years ago

@RieksJ Point taken as to your first suggestion (I edited my wording). As to your second, I did not mean to suggest that there should be a 1-1 correspondence between exec-engine instructions and database instructions, it just so happens to turn out that way for the proposed semantics of insert and update.

As to your solution, we seem to be on the same page, right? That is: use different ExecEngine statements to distinguish different behaviors, so that the implemented behavior which the &-programmer gets is less likely to be a surprise. In the case that would solve this issue: separate Insert and Update.

RieksJ commented 2 years ago

As long as we only add execengine functions, and not change existing ones, I'm ok.

stefjoosten commented 2 years ago

May I try to refocus the discussion?

The current discussion, above, addresses the behavior of the prototype when it works correctly. However, this issue is about the behavior when the prototype produces error messages. The undesirable case yields an infinite loop. That is what Ampersand shows today. The desirable case is a message as though UNI was syntactical sugar for RULE InjEmail : email;email~ |- I[Person]. Although this might seem a small issue from an Ampersand-builder/maintainer perspective, an infinite loop is very hard to diagnose for the Ampersand programmer; it took me about 13 working hours (and huge frustration) to diagnose the bug. I expect this to be an insurmountable obstacle for students.

The language specification says that UNI is syntactical sugar for the rule InjEmail, while practice shows otherwise. This behavior would not have occurred if the prototype would run on a triple store. So, the underlying implementation shows its face between the cracks of the system. For this reason alone, the error message should match that of the rule InjEmail where RELATION email is not declared [UNI].

What can we do to improve the error message? (By the way, I'm comfortable with the implicit deletion as is. That is beyond the scope of this issue.)

RieksJ commented 2 years ago

When an infinite loop ends (yes!), the tool simply dumps some stuff on the screen that works for me, but not for others. Perhaps it is just a matter of polishing it up a bit, saying something like:

After [n] runs, the ExecEngine could still not resolve all violations. 
Here is a list of rules that still have violations:
- [rulename1]: (src, tgt), (src, tgt), ...
- [rulename2]: (src, tgt), (src, tgt), ...

I would say that students should be taught how to deal with infinite loops as they will sooner or later run into them (if they keep using &, that is). And by 'dealing with', I mean interpreting the list of violations and wrecking your brain, focusing on how to rewrite your rules so that it no longer occurs (as opposed to rewriting parts of the tool).

Michiel-s commented 2 years ago

I'm a bit lost here, but I'll read the discussion and suggestions above and try to propose a fix for the issue of needing to spent to much time to debug. That shouldn't be the case; not for us as developers nor for students.

Michiel-s commented 2 years ago

I like the suggestion by @RieksJ

The solution is straightforward: we could add a set of ExecEngine functions that behave as the theory likes them to be. It seems like the practical thing to do. As a side-effect, it provides a means to illustrate the difference between theory and practice, not only in theory, but in practice as well.

Backed by @sjcjoosten:

As to your solution, we seem to be on the same page, right? That is: use different ExecEngine statements to distinguish different behaviors, so that the implemented behavior which the &-programmer gets is less likely to be a surprise. In the case that would solve this issue: separate Insert and Update.

But then @stefjoosten mentions that this issue is not about the implicit deletion

What can we do to improve the error message? (By the way, I'm comfortable with the implicit deletion as is. That is beyond the scope of this issue.)

So what can we do about the error message?

@stefjoosten, would the error message that @RieksJ suggested have helped you to diagnose the problem more quickly?

After [n] runs, the ExecEngine could still not resolve all violations. 
Here is a list of rules that still have violations:
- [rulename1]: (src, tgt), (src, tgt), ...
- [rulename2]: (src, tgt), (src, tgt), ...
Violation that were fixed in the last run:
- [rulename1]: (src, tgt), (src, tgt), ....

AmpersandTarski / prototype

Strange case of "Maximum reruns exceeded for ExecEngine" #95

What happened?

What I expected

Analysis

Story

Diagnosis