AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

Term I#"aap" causes the normalizer to make a mistake. #1293

Closed stefjoosten closed 2 years ago

stefjoosten commented 2 years ago

What happened

I have a script that contains the following rule

   ROLE ExecEngine MAINTAINS MaakAanmeldingGGmobiel
   RULE MaakAanmeldingGGmobiel :
      (gezinslid~;gastgezin~;I[AanmeldingGG];telefoon - (I#"-")) - personPhoneNumber
      |- personPhoneNumber
   VIOLATION ( TXT "{EX} InsPair;personPhoneNumber;Person;",SRC I, TXT ";PhoneNumber;", TGT I )

The log-file of my prototype shows that the Engine has called this rule:

prototype  | [2022-06-01 16:45:59] RULEENGINE.DEBUG: Checking rule 'MaakAanmeldingGGmobiel' [] {"request_id":"f71b2fee6d"}
prototype  | [2022-06-01 16:45:59] RULEENGINE.DEBUG: Evaluating conjunct 'conj_176' [] {"request_id":"f71b2fee6d"}
prototype  | [2022-06-01 16:45:59] DATABASE.DEBUG: select distinct nothing.a as src, nothing.a as tgt from (select 1 as a) as nothing where a <> 1 [] {"request_id":"f71b2fee6d"}
prototype  | [2022-06-01 16:45:59] RULEENGINE.DEBUG: Conjunct 'conj_176' holds [] {"request_id":"f71b2fee6d"}
prototype  | [2022-06-01 16:45:59] RULEENGINE.DEBUG: Rule 'MaakAanmeldingGGmobiel' holds [] {"request_id":"f71b2fee6d"}

Note the SQL expression on the third line of this log fragment (let me make it a bit more readable):

SELECT DISTINCT nothing.a AS src,
                nothing.a AS tgt
FROM
  (SELECT 1 AS a) AS NOTHING
WHERE a <> 1

This result is obviously incorrect. However, the rule

   ROLE ExecEngine MAINTAINS MaakAanmeldingGGmobiel
   RULE MaakAanmeldingGGmobiel:
      (gezinslid~;gastgezin~;I[AanmeldingGG];telefoon) - personPhoneNumber
      |- personPhoneNumber
   VIOLATION ( TXT "{EX} InsPair;personPhoneNumber;Person;",SRC I, TXT ";PhoneNumber;", TGT I )

produces

SELECT DISTINCT t1.src AS src,
                t1.tgt AS tgt
FROM
  (SELECT t1.src AS src,
          t1.tgt AS tgt
   FROM
     (SELECT fence0.src AS src,
             fence3.tgt AS tgt
      FROM
        (SELECT "Person" AS src,
                "Gastgezin" AS tgt
         FROM "gezinslid"
         WHERE ("Gastgezin" IS NOT NULL)
           AND ("Person" IS NOT NULL)) AS fence0,

        (SELECT "gastgezin" AS src,
                "AanmeldingGG" AS tgt
         FROM "AanmeldingGG"
         WHERE ("AanmeldingGG" IS NOT NULL)
           AND ("gastgezin" IS NOT NULL)) AS fence1,

        (SELECT "AanmeldingGG" AS src,
                "AanmeldingGG" AS tgt
         FROM "AanmeldingGG"
         WHERE "AanmeldingGG" IS NOT NULL) AS fence2,

        (SELECT "AanmeldingGG" AS src,
                "PhoneNumber" AS tgt
         FROM "telefoon"
         WHERE ("AanmeldingGG" IS NOT NULL)
           AND ("PhoneNumber" IS NOT NULL)) AS fence3
      WHERE (fence0.tgt = fence1.src)
        AND (fence1.tgt = fence2.src)
        AND (fence2.tgt = fence3.src)) AS t1
   LEFT JOIN "personPhoneNumber" AS t2 ON (t1.src = t2."Person")
   AND (t1.tgt = t2."PhoneNumber")
   WHERE (t2."Person" IS NULL)
     OR (t2."PhoneNumber" IS NULL)) AS t1
LEFT JOIN "personPhoneNumber" AS t2 ON (t1.src = t2."Person")
AND (t1.tgt = t2."PhoneNumber")
WHERE (t2."Person" IS NULL)
  OR (t2."PhoneNumber" IS NULL)

The difference between the two scripts is the term (I#"-").

What I expected

I expected the first example to produce correct code. However, Ampersand didn't.

Version of ampersand that was used

Ampersand-v4.6.2

stefjoosten commented 2 years ago

Diagnosis

I have isolated this bug to the following script:

CONTEXT "Issue1293"

   REPRESENT A TYPE INTEGER

   RELATION r[A * B] = [ (1,"aap"), (2,"noot"), (3,"mies") ]
   RELATION s[A * B] = [ (2,"noot"), (3,"mies") ]

--    ROLE ExecEngine MAINTAINS Issue
   RULE  r - (I#"aap") |- s
--    VIOLATION ( TXT "{EX} InsPair;personPhoneNumber;Person;",SRC I, TXT ";PhoneNumber;", TGT I )

ENDCONTEXT

I ran "ampersand proofs" on this script, which gave the following analysis: Rules and their conjuncts for Issue1293

    rule r: rule@/Users/sjo00577/git/Nutwente/model/Issue1293.adl:9:10
    formalExpression r: r [A*B]-(I[A]#"aap"[B]) |- s [A*B]
    conjNF: V [A*B]
    conj: V [A*B]

To demonstrate that conjNF is mistaken, let us first establish the value of I#"aap":

    I#"aap"
  =  {semantics of #}
    I;V;"aap"
  =
    V;"aap"
  =
    { (1,"aap"), (2,"aap"), (3,"aap") }

Now, let us calculate the value of the entire expression:

    r - (I#"aap")
  =
    { (1,"aap"), (2,"noot"), (3,"mies") }  -  { (1,"aap"), (2,"aap"), (3,"aap") }
  =
    { (2,"noot"), (3,"mies") }

Clearly, this differs from V[A*B]. So, conjNF produces the wrong answer. This is consistent with the symptoms signaled above.

stefjoosten commented 2 years ago

I used ampersand proofs to find out that the normalizer derives this:

     r [A*B]-(I[A]#"aap"[B]) |- s [A*B]
<=> { remove |- }
     -r [A*B]-(I[A]#"aap"[B])\/s [A*B]
<=> { V\/x = V }
     V [A*B]

This means that the simplifier performs the last (erroneous) step towards V.

A side catch is an error in showA. It is obvious that -r [A*B]-(I[A]#"aap"[B])\/s [A*B] should have been printed as -(r [A*B]-(I[A]#"aap"[B]))\/s [A*B] (with extra brackets, that is).

It is weird that the normalizer uses absorption. The comment V\/x = V leads us to the rule where the normalizer absorbs an argument on the right-hand side: image I have put in a trace, to see what it does. Here is the trace output:

x =  -r [A*B]-(I[A]#"aap"[B])\/s [A*B]

The fact that trace produces output proves that isTrue (-r [A*B]-(I[A]#"aap"[B])) holds. Since showA is suspect, let us double-check the parser. So, I have also printed the trace in Haskell, using show instead of showA:

x = EUni (ECpl (EDif (EDcD r[A*B],EBrk (EPrd (EDcI A,EMp1 "aap" B)))),EDcD s[A*B])

It is obvious that isTrue contains a mistake.

stefjoosten commented 2 years ago

Now let us chase the bug in isTrue

    isTrue (ECpl (EDif (EDcD r[A*B],EBrk (EPrd (EDcI A,EMp1 "aap" B))))
=
    isFalse (EDif (EDcD r[A*B],EBrk (EPrd (EDcI A,EMp1 "aap" B)))
=
    isFalse (EDcD r[A*B]) || isTrue (EPrd (EDcI A,EMp1 "aap" B))
=
    False || (not (isFalse (EPrd (EDcI A,EMp1 "aap" B)))
                  && isTrue (EDcI A) && isTrue (EMp1 "aap" B) )
=
    (not (isFalse (EPrd (EDcI A,EMp1 "aap" B)))
                  && False && False )
=
    not (isFalse (EPrd (EDcI A,EMp1 "aap" B)))
=
    not (isFalse (EDcI A) || isFalse (EMp1 "aap" B))
=
    not (False || False)
=
    True

There is clearly a mistake here. It is in the isTrue of the cartesian product operator EPrd. So the mistake is made in the second step of the derivation. This is a correct derivation:

    isTrue (ECpl (EDif (EDcD r[A*B],EBrk (EPrd (EDcI A,EMp1 "aap" B))))
=
    isFalse (EDif (EDcD r[A*B],EBrk (EPrd (EDcI A,EMp1 "aap" B)))
=
    isFalse (EDcD r[A*B]) || isTrue (EPrd (EDcI A,EMp1 "aap" B))
=
    False ||  (isTrue (EDcI A) && isTrue (EMp1 "aap" B) )
=
    False ||  (False && False)
=
    False

I have established that the isTrue of our expression yields False. We can close the issue if it passes the tests of the pull request I just made.

hanjoosten commented 2 years ago

Are you going to fix both mistakes by means of this one issue?

stefjoosten commented 2 years ago

Are you going to fix both mistakes by means of this one issue?

No, the other one deserves a separate issue.