Open Michiel-s opened 3 years ago
I like the idea. I know @sjcjoosten has done an experiment long time ago like this. I whould like to have the semantics of the operators involved really clear. My feeling is that this should fit with database functions. This might make us more reliant on specific databases, but I do not see that as a problem. See here for some inspiration. I guess that if we start with comparison functions, that should already help a lot. However, we must do this in a formally correct way, so I really would like to hear the opinion of @sjcjoosten on this.
I love the idea, but I don't believe we should introduce a separate operator, as <
is a relation, much like I
and V
, though only defined on concepts that are represented as DATE
, INT
, and perhaps a few others. The expression r < s
, if I understand what you mean correctly, would be equivalent to r;<;s~
where <
is a relation. Translating your example:
RELATION r[A*Date] [UNI] -- The 'uni' is here because this is a translation of Michiel's example
RELATION s[A*Date] [UNI] -- The 'uni' is here because this is a translation of Michiel's example
RELATION <[Date*Date] -- note: if we build this into Ampersand, I propose that you wouldn't declare it, just like we don't declare I and V.
REPRESENT Date TYPE DATE
RULE I[A] |- r;<;s~
It would be great if the generated SQL were as succinct as you describe, as well as having some reasoning like knowing what - <
is (it is the greater-or-equal relation). All quite feasible and all of this can wait until after we have a <
relation. There are also some details to be worked out regarding the type checker, because I think we could decide to prevent expressions like r;<;<;s
from occurring via the type system, requiring the user to write something like r;<;I[Date];<;s
instead, which I believe has a different meaning (the difference is that I would like to interpret r;<;<;s
as r;<;I[DATE];<;s
, and I[Date]≠I[DATE]
).
I fully agree with @sjcjoosten by adding these kind of calculations as built-in relations.
Indeed, the functionality I meant to describe is then formulated as r;<;s~
, well actually I had I /\ (r;<;s~)
in mind, but the way you define <
gives more flexibility.
What could be the first step? @sjcjoosten you starting with the type checker work?
If I can get some voice over on the SQL generator part I can help with coming up with the right SQL code. @hanjoosten, something we can pick up?
Based on what I can oversee right now, no work is needed on the prototype framework implementation. It's all expressions and generated SQL work.
I agree with @sjcjoosten that r;<;<;s
should be interpreted as r;<;I[DATE];<;s
, where I[DATE]
is the lub of the source and target of <
. Simply because that is how the type rule of ;
is (in the current system). I see no reason to change that nor to forbid r;<;<;s
.
I suggest to use =<
for less than or equal, >=
for greater than or equal, so <=
and =>
remain free for future use.
Here is a plan:
=<
(less than or equal) for numbers.=<
for numbers and merge it to main (release it) before adding any more built-in operators, so we can learn with minimal pain and effort.=<
for dates, to be built after the first one is released.r;I[A];s
and r;-I[A];s
, to prevent such expressions from being compiled with two joins in them. Similar optimizations should ensure the efficient compilation of r;=<;s.=<
must be made impossible. (Btw, I wonder what happens if we insert atoms in I
or V
relations, but I faintly remember that we have been there...)why use =<
for less than or equal while the rest of the world uses <=
? Where <=
is easily seen (and remembered) as <
-less than or =
equal, =<
would then read as 'equal or less than', which is awkward.
I agree with @sjcjoosten that
r;<;<;s
should be interpreted asr;<;I[DATE];<;s
. I see no reason to change that nor to forbidr;<;<;s
.
I see no easy way to generate SQL code for r;<;I[DATE];<;s
(or what that expression would even look like in SQL).
why use
=<
for less than or equal while the rest of the world uses<=
? Where<=
is easily seen (and remembered) as<
-less than or=
equal,=<
would then read as 'equal or less than', which is awkward.
That is settled then....
We have a new branch, feature/greaterThan-issue-1256.
When working on this issue, @hanjoosten and I stumbled on a problem. After a good discussion, we decided to ask for help because this requires some good thinking wrt consequences for the type checker. @sjcjoosten, please help to analyze this.
We are building <=
as a built-in relation. This leads to the following changes in the Haskell code of the Ampersand compiler:
<=
on numbers.EBui :: Relation -> Expression
, which is used in the SQL generator (a.o.) to distinguish built-in relations from user-defined relations.PBuiltInR :: P_BuiltInRel -> TermPrim
, so the parser can allow the Ampersand programmer to write <=
in every place where a relation identifier is allowed.pBuiltIn
to parse the built-in relation <=
.declMap :: Map Text (Map SignOrd Expression)
. The type checker uses that declaration table to disambiguate terms.declMap
with the information of the built-in relations, so we can leave the type checker alone.<=
is to be a built-in relation, we get a table of built-in relations. This table would look something like this:
builtins :: [Relation]
builtins = [Relation
{ decnm = T.pack "<="
, decsgn = sgn
, decprps = Set.empty
, decDefaults = Set.empty
, decprL = ""
, decprM = ""
, decprR = ""
, decMean = []
, decfpos = Origin (“built-in relation ( <= "<>T.pack (show sgn)<>")")
, decusr = False
, decpat = Nothing
, dechash = hash sgn
}]
where sgn = Sign c c
c = PlainConcept ("Num" NE.:| [])
sgn
, of the built-in relation properly. (I wrote “Num”, but that type doesn’t exist.). This phenomenon requires some discussion. Han and I saw many different ways to tackle this problem. Here are a few, but perhaps there are more:
builtins :: [Protorelation]
instead of builtins :: [Relation], where the Protorelation
contains a variable instead of the fully defined signature. The type checker can fill the variable along the way and generate an error message if it fails to do so. We were wondering how much impact this will have on the type checker. @sjcjoosten please comment.NUM
, STRING
, and DATE
, that correspond to our technical types (and to the ISO standard for MySQL and Excel). Then define <=[NUM,NUM]
, <=[STRING,STRING]
, and <=[DATE,DATE]
as predefined relations in the declaration table declMap
and let the type checker sort things out. Of course, we want to allow the user to add a type annotation to <=
to disambiguate where needed. This should then work for all concepts that are a specialization of the three primitive types, NUM
, STRING
, and DATE
. This approach seems to agree with earlier ideas to incorporate technical types into the type system. However, this also affects the type checker, so @sjcjoosten please comment.<=
as an operator in the compiler, instead of a primitive term. In that case, I want to call it ";<=;" because I don't want to explain to students that they should read r<=s
as r;<=;s
. @hanjoosten is now working on this thread in a separate branch, if only to see if we stumble on other difficulties. This seems to be at odds with the original idea of <=
as a relation. An obvious impact on the type checker is that we introduce new code that bears similarity with the type checker code for the composition operator, ;
. Would that be doable without any programming effort of @sjcjoosten?Please feel free to offer other ideas...
We should go with option 2 above, provided that the user cannot write NUM, STRING, etc directly, but only via 'representations'. The built-in primitive concepts are already there, and even already present in the type-checker. See: https://github.com/AmpersandTarski/Ampersand/blob/main/src/Ampersand/ADL1/P2A_Converters.hs#L485
Of course, after type-checking, the built-in primitive types like NUM and STRING are removed: they need to be specialized to whatever the user defined, and that's why A_Concept does not have a representation for them. If they cannot be removed, a fatal is thrown, see: https://github.com/AmpersandTarski/Ampersand/blob/main/src/Ampersand/ADL1/P2A_Converters.hs#L50
Some reworking of the Ampersand datatypes is unavoidable [edit: I have some ideas on how to minimize the need for this, which I lay out below. I removed the thoughts that no longer apply.]
@sjcjoosten , What is the problem with alternative 3? Which obstacles will we encounter if we walk that path?
@sjcjoosten said:
Of course, after type-checking, the built-in primitive types like NUM and STRING are removed
If <=[NUM,NUM]
, <=[STRING,STRING]
, and <=[DATE,DATE]
are built-in relations, then we might argue that NUM
, STRING
, and DATE
are built-in concepts as well. So why remove them? Being built-in means that the user can use them as any other concept. The difference is just that they are always there (as some other things are). In the generated documentation, we might hide built-in relations and concepts if the script does not refer to them. That would keep such a generated document more readable.
In accordance with 2., I suggest that <=
can work very similarly to I
, in that one can alway read/interpret it to be a <=[NUM \/ STRING \/ DATE, NUM \/ STRING \/ DATE]
relation, but the type-checker will ensure that it can fill in a user type (some A_Concept
). Note that this user-type might not be Endo: <=[Start*End]
might occur in a rule that expresses that end-times should follow start-type.
As a consequence, a rule like r;<= = s;<=
would not be allowed by the type-checker: the target of r;<=
is NUM \/ STRING \/ DATE
, which is not a user type. However, writing r;<=;I[Birthday] = s;<=;I[Birthday]
would be fine.
Another consequence of this would be that the type-checker can fill in the sign of the <=
relation as a normal user type.
Idea nr. 3 has a problem. If we let the Ampersand programmer believe that <=
is a relation, then she will sooner or later use expressions such as <= - I[A]
. And we would surely want to allow such things because the semantics are perfectly clear. Under idea nr. 3, in which <=
is an operator, such expressions are syntactically incorrect.
On the other hand, if we let the Ampersand programmer believe that <=
is an operator, it is hard to explain why r<=s
must be read as r;<=;s
. After all, that notation suggests strongly that <=
is a relation, yet it isn't. Confusion lures...
(It is a pity that idea nr. 3 has this problem because it seems quite implementable. But ease of implementation should not sacrifice a correct implementation of the idea. Fortunately, this idea lives in a separate branch.)
Idea's nr. 1 and 2 do not suffer from this problem, because <=
is a (built-in) relation.
@sjcjoosten has convinced me (in an oral discussion) that idea 2 is the way to go.
I don't see the issue for idea 3 yet. <=
should be regarded as the relation denoted by I[cpt] ; <= ; I[cpt]
, where cpt
is a type variable for a concept. That concept is restricted by concepts having an ordering.
This means that when a user would write the expression <= - I[A]
, it would be regarded as the expression (I[cpt] ; <= ; I[cpt]) - I[A]
, where cpt
would become A
. A perfectly sane way to specify a relation with infinate tuples in it. Of course, that may lead to insane behaviour.
I will call @sjcjoosten soon to discuss this topic with him, to avoid too much noise in this thread.
I'm now adding predefined concepts and classifications to the language. My test script is:
CONTEXT LessThanOrEqual
RELATION r[INTEGER*INTEGER]
RULE r;<=;r
ENDCONTEXT
I have added the following representations and specializations to the language (for readability in quasi-Ampersand syntax):
CLASSIFY DATE ISA NUM
CLASSIFY INTEGER ISA NUM
RELATION "<=" [NUM*NUM]
REPRESENT "ALPHANUMERIC" TYPE "ALPHANUMERIC"
REPRESENT "BIGALPHANUMERIC" TYPE "BIGALPHANUMERIC"
REPRESENT "HUGEALPHANUMERIC" TYPE "HUGEALPHANUMERIC"
REPRESENT "PASSWORD" TYPE "PASSWORD"
REPRESENT "BINARY" TYPE "BINARY"
REPRESENT "BIGBINARY" TYPE "BIGBINARY"
REPRESENT "HUGEBINARY" TYPE "HUGEBINARY"
REPRESENT "DATE" TYPE "DATE"
REPRESENT "DATETIME" TYPE "DATETIME"
REPRESENT "BOOLEAN" TYPE "BOOLEAN"
REPRESENT "INTEGER" TYPE "INTEGER"
REPRESENT "FLOAT" TYPE "FLOAT"
REPRESENT "OBJECT" TYPE "OBJECT"
REPRESENT "TYPEOFONE" TYPE "TYPEOFONE"
This yields:
sjo00577@BA92-VYF9TXMD9G Ampersand % ampersand check testing/issue1256.adl
/Users/sjo00577/git/Ampersand/testing/issue1256.adl:6:10 error:
Ambiguous type when matching: Tgt of <= [NUM*NUM]
and Src of r [A*A].
The type can be A or DATE/INTEGER/NUM
None of these concepts is known to be the smallest, you may want to add an order between them.
ExitFailure 10
@sjcjoosten, can you take a look at these code snippets? I don't trust pClassify2aClassify
. Can you remember the intended semantics of PClassify
? I recall that the meaning of PClassify _ s gs
is $atoms(s)\subseteq\bigcup(map atoms gs)$, but am I wrong? It does not seem to correspond with the following code. That code seems to have worked throughout the years, but it hardly sheds any light on the intended semantics...
data PClassify = PClassify
{ pos :: Origin,
-- | Left hand side concept expression
specific :: P_Concept,
-- | Right hand side concept expression
generics :: NE.NonEmpty P_Concept
}
pClassify2aClassify :: ConceptMap -> PClassify -> Maybe AClassify
pClassify2aClassify fun pg =
case NE.tail (generics pg) of
[] -> case filter (/= specCpt) [pCpt2aCpt fun . NE.head $ generics pg] of
[] -> Nothing
h : _ ->
Just
Isa
{ genpos = origin pg,
gengen = h,
genspc = specCpt
}
_ -> case NE.filter (/= specCpt) . fmap (pCpt2aCpt fun) $ generics pg of
[] -> Nothing
h : tl ->
Just
IsE
{ genpos = origin pg,
genrhs = h NE.:| tl,
genspc = specCpt
}
where
specCpt = pCpt2aCpt fun $ specific pg
Specifically, is it true that CLASSIFY S ISA G
translates (roughly, ignoring the NE bit) to PClassify o S {G, S}
and CLASSIFY S IS G/\K/\L
translates (roughly) to PClassify o S {G, K, L}
?
I recall that the meaning of PClassify _ s gs is $atoms(s)\subseteq\bigcup(map atoms gs)$, but am I wrong?
I am confused, I would say that PClassify _ s gs
means $atoms(s) = \bigcap(map atoms gs)$ and since CLASSIFY S ISA G
translates to S = S /\ G
, so that would be PClassify o S [G,S]
. The names generic
and specific
seem to suggest that there is a subset involved, but I believe in the equality.
I am also a bit suspicious about pClassify2aClassify
, as I read this code it seems like Isa
is never produced, only IsE
is, but you claim this code works?
I have slept over it. You are right. We defined the meaning of PClassify _ s gs
to be $atoms(s)=\bigcap(map atoms gs)$. I will redo the code because it is bound to be wrong. I think we just never saw the effects of it being wrong.
So, CLASSIFY Employee ISA Person
translates to PClassify orig (PCpt "Person") {PCpt "Person") (PCpt "Employee"}
(in the P-structure) and that translates to Isa orig (C "Person") (C "Employee")
(in the A-structure).
Similarly, CLASSIFY Eagle IS Bird/\Predator
translates to PClassify orig (PCpt "Eagle") {PCpt "Bird",PCpt "Predator"}
and that translates to IsE orig (C "Eagle") [C "Bird",C "Predator"]
.
In relation to the discussion in #207, I propose to do a small experiment to see how we can add calculations to our expressions and what is needed for that.
Inspired by the SHACL constraint language for RDF (https://www.w3.org/TR/shacl/#core-components), we can define a list of calculations to support, like
lessThan
andlessThanOrEqual
.E.g. for
lessThan
we could add the operator<
. We need to define for which types this is supported and the precedence of the operator. The typesINTEGER
,FLOAT
,DATE
andDATETIME
.The meaning of the operator
<
would be the set of A*A for which the value oftgt of r
is less than the value oftgt of s
, with the restriction thatr
ands
are univalent. Sor < s
is basically a calculated property forA
.The generated SQL can be a join of
r
ands
(like withr;s
) but instead of the WHERE clause statingr.tgt = s.src
we needr.tgt < s.src
.What do you think @hanjoosten @sjcjoosten? Can we add this one to see how it goes?