AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

Syntax for union-types #1072

Open sjcjoosten opened 4 years ago

sjcjoosten commented 4 years ago

Problem

Suppose we have a generic concept and two specific concepts, e.g.

contract :: Employee * Contract
CLASSIFY FixedTermContract ISA Contract
CLASSIFY PermanentContract ISA Contract

How do we build an interface for Contract?

The general problem is how to combine interfaces for specific concepts into an interface for a generic concept. Currently we have no operators to combine interfaces, so this calls for a language extension. This issue is to agree on some syntax for that.

Proposal

For example, an employee can be on a fixed-term and on a permanent contract, and the contract interface for such an employee may have two different looks. However, the concept that varies is Contract (which can be Permanent or Fixed-term):

contract :: Employee * Contract
CLASSIFY FixedTermContract ISA Contract
CLASSIFY PermanentContract ISA Contract
startdate :: Contract * Date
enddate :: FixedTermContract * Date

My proposal is to use the operator | for this:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
       BOX [ "Start" : startdate
              , "??" : I[FixedTermContract] BOX [ "End" : enddate ]
                     | I[PermanentContract] BOX []
              ]
      ]

The semantics would be similar to having a , there, with two subtle differences:

For me personally, the need for | came up when working on Issue #1071 as I don't have a way to specify Haskell data-types with unions in them.

sjcjoosten commented 4 years ago

@stefjoosten could you re-title this into something without the word union-type? (I.e. give a non-mathematical description), and could you also confirm that this example demonstrates sort of what you decided you needed when writing Ampersand models?

(Stef did this on Apr 23rd 2020 and also reformulated the problem)

hanjoosten commented 4 years ago

~I like this syntax. However I wonder if the concept expression has any added value. It could probably as well be omitted, I guess.~

sjcjoosten commented 4 years ago

~@hanjoosten what concept expression do you mean?~

hanjoosten commented 4 years ago

Sorry, I realize I have been unclear, and the concern I have is slightly different too.

So let us look at your example (slightly modified the layout)

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
          BOX [ "Start" : startdate
              , "??"    : I[FixedTermContract] BOX [ "End" : enddate ]
                        | I[PermanentContract] BOX []
              ]
      ]

The bindedExpression contract could be any expression. In your example, you conveniently took a relation that has Contract as the target concept. However what happens if the user would take another concept as the target. Let's take concept A. Further we define:

RELATION r[Employee * A] 
RELATION s[A * Contract]
RELATION t[A * Contract]

Now a user could write another INTERFACE statement, obviously with different semantics (a.o. no longer editable) too:

INTERFACE EmployeeContract2 : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : r
          BOX [ "Start" : s;startdate
              , "??"    : s;I[FixedTermContract] BOX [ "End" : enddate ]
                        | t;I[PermanentContract] BOX []
              ]
      ]

The point is, that the expressions s;I[FixedTermContract] and t;I[PermanentContract] could now become unrelated in all kinds of ways, making the semantics of the | operator less obvious. I would rather see that the context expressions in the related subinterfaces (related by the | operator) would be limited to expressions of the form I[SmallerConcept]. This allows for pattern matching too. We might even consider an extension like the case statement in Haskell:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
          BOX [ "Start" : startdate
              , "??"    : case I of
                  FixedTermContract -> BOX [ "End" : enddate ]
                | PermanentContract -> BOX []
              ]
      ]
sjcjoosten commented 4 years ago

Ah I love the case keyword to introduce a list of interfaces that are separated in a different way from the ordinary ,. I see your point that this syntax might be somewhat easier to come up with a semantics for. However, it would be at the expense of being harder to use and I don't believe that us having to think harder about its semantics is enough of a reason to restrict the syntax to concepts only. In particular, I think the restriced I-only syntax suffers from almost all the same drawbacks as allowing the expressions s;I[FTC] and t;I[PC], without its flexibility.

The additional properties of I that you win to make the | operator have "obvious semantics" is that the expressions are guaranteed to be partial functions (UNI). You also guarantee that their targets are of a unifiable type, but that's something we can require (if we need to) in either case. What you lose is that users will be forced to introduce ISA whenever an interface requires a case statement somewhere. Even in those cases where we would currently advice students not to use ISA but to use relations instead (I believe ISA is best reserved for permanent properties). Furthermore, the semantics don't become "obvious" for arbitrary concepts either: the two concepts used could be partially overlapping and they might also not cover all cases. Instead, let's come up with a semantics in which the cases where those expressions areUNI (and pairwise disjoint and that have a union that is TOT) would indeed yield the "obvious semantics", but is generalized.

Another issue I want to raise is that if we were to convert your syntax to type-checked Ampersand, we would probably want to translate case exp of A -> .. | B -> .. into something where the proper epsilon-conversion-relation is used. That is: caseof exp;eps1 -> .. | exp;eps2 -> ... Those look a lot like expressions to me and I would love to maintain the property that whatever we do with ISA we can also do without ISA (the generated code will change, but its behaviour should not).

hanjoosten commented 4 years ago

I see your point. I think it would be great if we could achieve something like this. However, I also think that whatever syntax we end up with, we must be able at compiletime to check that we indeed we have disjunct sets, and that any possible target will be in exactly one of them. That way, at least we could warn the &-modeller that there is overlap or missing stuff, as in Haskells case expression.

sjcjoosten commented 4 years ago

Let's talk semantics..

I believe that we need to come up with semantics for the following tuple:

( (exp1,IF1)
, (exp2,IF2) )

Here exp1 and exp2 are possibly restricted by a combination of syntax and our type checker (discussion ongoing). I believe whatever separates the union-types should be associative, so the semantics for the tuple will generalize to triples etc. I would also like the empty relation to be a left unit and I believe any semantics we'll come up with satisfy that, so this discussion should generalize to cases with 0 and 1 elements.

Semantics by example

For an example in which the semantics are obvious, suppose exp1 is populated by (a,a),(b,b) and exp2 is populated by (c,c),(d,d), then IF1 is shown if the root node is a or b, and IF2 is shown if the root node is c or d, and I believe it is safe to generalize and say that neither interface is shown if the root node is anything else.

In what follows, we populate exp1 and exp2 differently and see what choices we'll have to make.

overlap

Next, suppose there is overlap: exp1 is populated by (a,a),(b,b) and exp2 is populated by (b,b),(c,c). The interesting case is when the root node is b. I can see two possible types of behavior: either IF1 is shown because it occurs first, or both IF1 and IF2 are shown. I'll name these two variations FCFS (first come first serve) and WCABF (we can all be friends). For these specific populations, we can get one in terms of the other:

Suppose we build FCFS and a user wants WCABF, then the user can write the following instead:

( (exp1 /\ exp2,IF1 \/ IF2)
, (exp1,IF1)
, (exp2,IF2) )

(we never defined the union of two interfaces but I imagine it to mean repeating the interface fields of both and let's just assume the type checker accept this)

Suppose we build WCABF and a user wants FCFS, then the user can write the following instead:

( (exp1,IF1)
, (exp2 - exp1,IF2) )

I take the expressions exp1 and exp2 to be super simple, and the interfaces to be super involved, so I imagine WCABF semantics will result in less code duplication.

Heterogeneous expressions.

Next, I want to note that all of the above applies to heterogeneous semantics: if exp1 were (a,1),(b,2) and exp2 were (b,2),(c,3) we'd get the same two options and can repeat all of the above. But we are assuming that exp1 and exp2 have the same types in order to form the expressions exp1 /\ exp2 and exp2 - exp1. Can we lift that restriction?

Let exp1 be populated by (a,a),(b,b) and let's use (b,2),(c,3) for exp2. We can still assign the same semantics, but the conversion between the two is not so obvious anymore. I do see how we can get FCFS if we built WCABF. The user can write:

( (exp1,IF1)
, (exp2 - exp1;V, IF2) )

We cannot get WCABF if we build FCFS, because IF1 \/ IF2 is ill-typed. I tend to favor WCABF now.

Non-univalent expressions

If an expression is non-univalent, the question becomes how many times we wish to repeat an interface. Let's populate exp1 and exp2 again. Say exp1 has (a,0),(a,1)and exp2 has (b,2) as populations. The interesting case is what to do if the root node is a. I can see two things happening: show IF1 for 0 only, or show IF2 twice, once for 0 and once for 1. I'm inclined to call the former FCFS-2 and the latter WCABF-2. There is no way to write FCFS-2 in terms of WCABF-2 this time, because there is no way to get the first element of a set (or even any single element). The fact that FCFS-2 makes a choice of first element also makes me really dislike the idea of implementing FCFS-2, it does not feel clean. WCABF-2 would be buildable.

Other options

Like Han says in his previous comment, only going for the obvious case and giving the user trouble if he tries otherwise is appealing too. Currently, however, there is no way to tell Ampersand that two concepts are disjoint, though I don't think it would be difficult either.

stefjoosten commented 4 years ago

I can see a problem with the original proposal:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
       BOX [ "Start" : startdate
              , "??" : I[FixedTermContract] BOX [ "End" : enddate ]
                     | I[PermanentContract] BOX []
              ]
      ]

Expressions I[FixedTermContract] and I[PermanentContract] are generally not identities, not even properties, and not even endo.

Consider:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
          BOX [ "Start" : startdate
              , "??" : bankAccount BOX [ "account number" : accNumber ]
                     | employee BOX ["employee name" : name]
              ]
      ]

If I understand Sebastiaan correctly, the fact that bankAccount and employee are disjoint (provable by the type system) makes this is equivalent to:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
          BOX [ "Start" : startdate
              , "??" : I  BOX [ "account number" : bankAccount;accNumber 
                              , "employee name" : employee;name
                              ]
              ]
      ]

The user cannot solve this with FCFS vs WCABF if bankAccount and employee are type incompatible. So should we then forbid this? (with a type error saying that bankAccount and employee have different target concepts?).

My tuppence worth of thought is this:

on syntax

In the old Ampersand, each box contains a list of triples: (label, expression, box) with an empty box if the user doesn't write one.

I suggest that each box contains a list of triples: (label, expression, mapping(concept->box)). The case-syntax proposed by Han has that. Since a mapping is a set, I hope that the | operator can be made associative and commutative given the right semantics. In the mapping, I would qualify double occurrences of the same concept as a type error. As for type checking, I'd say that every concept $c$ in the mapping must be smaller than or equal to the target concept of the box expression to enable the compiler to write code that determines the selected case uniquely.

on semantics (informally)

For every concept $C$ in the mapping, we can compute a uniqe path to the target of the box expression, which is the root of a typology in which all concepts from the mapping occur. Suppose $A\leq B\leq C$ is this path. Then I would suggest that the resulting interface is simply the union of all three interfaces. The union of interfaces may be something like (informal Haskell):

  -- Note: this function is partial! The type checker needs to ensure its precondition.
  -- Note: instead of lists, I would like sets in order to get associativity and commutativity of the interface union.
  -- however, the order of labels on the screen may require some compromise here.
triples (b1|b2)
 = transpose
   ( [ l1 | l1 <- labels b1, l2 `elem` labels b2, l1==l2]++
     [ l1  | l1 <- labels b1, l1 `notElem` labels b2]++
     [ l2 | l2 <- labels b2, l2 `notElem` labels b1]
   , [ e1 \cup e2 | (l1,e1,_)<-triples b1, (l2,e2,_)<-triples b2, l1==l2]++
     [ e1  | (l1,e1,_)<-triples b1, l1 `notElem` labels b2]++
     [ e2 | (l2,e2,_)<-triples b2, l2 `notElem` labels b1]
   , [ m1 \cup m2 | (l1,e1,_)<-triples b1, (l2,e2,_)<-triples b2, l1==l2]++
     [ m1  | (l1,_,m1)<-triples b1, l1 `notElem` labels b2]++
     [ m2 | (l2,_,m2)<-triples b2, l2 `notElem` labels b1]
   )

opinions

wider application of the interface union

The BOX-syntax is already delimited syntactically, so the union operator | between boxes could be added without pain to unite boxes (without being "guarded" by concepts). The empty box can (if needed) be represented by BOX [].

sjcjoosten commented 4 years ago

Your proposal does not avoid confronting users with the differences in semantics between WCABF and FCFS. In the 'overlap' section I have given example populations that are endo and properties. If your proposal did avoid confronting users with semantical difficulties, I'd be all for it. As it stands, I don't see a benefit to such a severe restriction.

sjcjoosten commented 4 years ago

Here's an example that shows the benefits of allowing for expressions. Suppose we want to see a mailbox a-la-outlook. The goal is to show a nice picture if your mailbox is empty, but to show email otherwise. I would be able to do that using expressions, writing something like this:

mail :: Mailbox * Mail -- describes which mails are in a mailbox

INTERFACE Mailbox : I[Mailbox]
  BOX ["??": I /\ mail;mail~ BOX [ "mail": mail ]
           | V[Mailbox*ONE]-(mail;V) BOX [ "picture" : ONE ]
      ]

Now the question is, can we do this using Han's syntax? The only thing we are allowed to write are concepts, so we have the choice between writing Mailbox and Mail in front of the box. Mail would give a type error, so we can only write something that looks like:

case Mailbox -> ..
     Mailbox -> ..

That's obviously not going to work. This means that I'm forced to change my data-model if I want to support this. Indeed, with a different data-model, I am able to write an equivalent script:

mail :: Mailbox * Mail -- describes which mails are in a mailbox

INTERFACE Mailbox : I[Mailbox]
  BOX [case I of EmptyBox -> BOX [ "picture" : I[ONE] ]
               | FullBox  -> BOX [ "mail" : mail ]
      ]

CLASSIFY FullBox ISA Mailbox
CLASSIFY EmptyBox ISA Mailbox

RULE FullBox : I[Mailbox];mail :- V[Fullbox*Mail]
MEANING "A mailbox is full if it contains at least one email"

RULE EmptyBox : V[Mailbox*Mail] - I[Mailbox];mail :- V[Emptybox*Mail]
MEANING "A mailbox is an emptybox if it contains no mail"

RULE FullBox : I[Fullbox];I[Mailbox];I[Emptybox] :- 0
MEANING "No mailbox is full and empty"

The reason I dislike this solution is that we now have a lot of semantic-model code (with rules and classify statements) that are only supposed to deal with interfaces.

Examples like the mailbox example is why I prefer to be able to write a flexible expression rather than a fixed concept.

Michiel-s commented 4 years ago

Oh I really like this issue and the conversation.

I definitely have a use case for the WCABF and expression approach of @sjcjoosten.

In my application, you all know as Semantic Treehouse, I try to specify as much as functionality with Ampersand. But that results in very ugly interfaces, even worse generated APIs and burden to maintain consistency. Let me give an example:

Use case for specifying context aware editing functionality

In our application we have elements (let's call them A) that can be locked or not. This is modeled like:

CONCEPT A ""
RELATION locked[A] [PROP]

Now we need a single interface where the user can navigate to, but depending wether the element is locked or not, different content and editing functionality is shown. We currently do this like so:

INTERFACE "Example" : I[A] BOX <DIV>
  [ "locked" : I;locked BOX [] -- interface without editing functionality
  , "unlocked" : I-locked BOX [] -- interface with editing functionality for certain fields
  ]

The locked and unlocked interfaces have the same fields, but for some fields editing functionality is specified (i.e. CRUDs are different). In the current situation I have to copy the whole BOX structure.

Note that the top level box is a BOX <DIV>. The div template doesn't show any markup, labels or anything. It can be used for these non-functional boxes.

With the proposed syntax of this issue, I could simplify this, e.g.:

INTERFACE "Example" : I[A] BOX
  [ "r" : r cRud
  , "s" : locked;s cRud
          | I-locked BOX [ "s" : cRUd ] -- a box to edit 's'
  ]

This way I don't have to duplicate code and specify only the editing functionality additionally.

sjcjoosten commented 4 years ago

I'm not happy with the title change, I think the way we teach students to use concepts (namely that permanent properties can be encoded as concepts with an ISA, but temporary ones - like locked/unlocked, should not) is a good way to see concepts. Michiel's example shows that union types are not about specialized concepts: he uses union types in a way to talk about what is fundamentally a transient property

Edit: I see now that I am the one who asked for this title change... I guess I unknowingly changed my mind some time between July and September.

RieksJ commented 4 years ago

TL;DR: Aren't we making things too difficult here? Again new syntax to remember and teach (and (mis)use)? As @Michiel-s said, we already have <DIV> (now <RAW> and <RAW form>).

The original example of @sjcjoosten was:

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract
       BOX [ "Start" : startdate
              , "??" : I[FixedTermContract] BOX [ "End" : enddate ]
                     | I[PermanentContract] BOX []
              ]
      ]

B.t.w.: this syntax won't work because BOX [] isn't allowed, but that's nit-picking...

Here is how you can currently already do this (jan 2021):

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract  BOX <RAW form>
         [ "generic contract stuff": I[Contract] BOX <FORM> [ "Start" : startdate ]
         , "fixed term contract stuff": I[FixedTermContract] BOX <FORM hideOnNoRecords> [ "End" : enddate ]
         , "permanent contract stuff": I[PermanentContract] BOX <FORM hideOnNoRecords> [ " ": TXT "No enddate (permanent contract)" ]
         ]
      ]
sjcjoosten commented 4 years ago

@RieksJ you might be right, and WCABF is semantically eerily similar to not implementing anything special. Coming back to Michiel's example, one almost-equal thing he could have written was:

INTERFACE "Example" : I[A] BOX
  [ "r" : r cRud
  , "s-locked" : locked;s cRud
  , "s-unlocked" : I-locked BOX [ "s" : s cRUd ] -- a box to edit 's'
  ]

So how about allowing the following instead:

INTERFACE "Example" : I[A] BOX
  [ "r" : r cRud
  , "s" : locked;s cRud
  , "s" : I-locked BOX [ "s" : s cRUd ] -- a box to edit 's'
  ]

That means: no new syntax to teach, just existing syntax to misuse. I propose only allowing duplicate labels if they appear consecutively, and drawing them indiscriminately whenever this happens. Semantically, that gives us union types.

RieksJ commented 4 years ago

We don't even need the allow-duplicate-labels feature. The script below already works now:

INTERFACE "Example" : I[A] BOX
  [ "r"  : r cRud
  , "s"  : locked;s cRud
  , "s " : I-locked BOX [ "s" : s cRUd ] -- a box to edit 's'
  ]

The 'technique' of using spaces to differentiate between strings may not be very elegant, but it helps getting stuff done, e.g. as in the following:

INTERFACE "Example" : I[A] BOX
  [ " "  : TXT "<hr>" --- draw a first horizontal line
  , "r"  : r cRud
  , "  " : TXT "<hr>" --- draw a second horizontal line
  , "s"  : locked;s cRud
  , "s " : I-locked BOX [ "s" : s cRUd ] -- a box to edit 's'
  , "   ": TXT "<hr>" --- draw a third horizontal line
  ]
sjcjoosten commented 4 years ago

Yes, but now there are two labels "s" instead of one in the interface.

RieksJ commented 4 years ago

Wondering why we had the restriction that labels within a box MUST be unique (rather than SHOULD be unique), and if we cannot lift that restriction. After all, for showing-on-the-screen purposes there is no reason to use 'MUST', whereas BOX-templates (e.g. PROPBUTTON, or the newly added OBJECTDROPDOWN and VALUEDROPDOWN templates - you should try them) will automatically enforce whatever they need to - including additional constraints such as required or optional labels.

sjcjoosten commented 4 years ago

That's because the labels within a box serve(d?) two purposes:

  1. to specify which fields there are in the generated API. An API is describes a key-value map in which the keys are labels (that is: labels serve as internal identifier in those APIs)
  2. to specify how those fields should be rendered in the interface, that is, what name to assign there
RieksJ commented 4 years ago

For purpose 1 I can see the 'MUST' requirement. For purpose 2 the 'MUST' requirement can be a (circumventable) obstacle - see the example above with horizontal lines. So I would vote for having 'SHOULD', and not enforce 'MUST'. Given that VIEWs and INTERFACEs are going to be unified, I can imagine one serving purpose 1 (enforcing the 'MUST' requirement) and the other serving purpose 2 (not enforcing 'MUST')...

hanjoosten commented 4 years ago

This distinction is what issue https://github.com/AmpersandTarski/Ampersand/issues/708 is about

sjcjoosten commented 4 years ago

I agree that now that we identified that we have one thing (labels) with two purposes (internal json keys, interface labels), we are in need of separating the one thing into two things.

The purpose of this issue, when I opened it, was to deal with the internal keys issue. Over the course of this issue we came up with the idea of allowing duplicates there and giving it a WCABF semantics. Your horizontal lines example is a bit weird to use in the discussion about internal keys because the " " internal key refers to an expression whose target is "<hr>", which is not an atom in the declared context, but it's a great illustration of the interface labels issue.

Issue #708 suggests separating the two purposes by having a LABEL keyword, which would allow us to assign labels to relations, concepts, rules, and interfaces, but not to internal json keys.

sjcjoosten commented 3 years ago

Here is how you can currently already do this (jan 2021):

INTERFACE EmployeeContract : I[Employee]
  BOX [ "Employee" : I
      , "Contract" : contract  BOX <RAW form>
         [ "generic contract stuff": I[Contract] BOX <FORM> [ "Start" : startdate ]
         , "fixed term contract stuff": I[FixedTermContract] BOX <FORM hideOnNoRecords> [ "End" : enddate ]
         , "permanent contract stuff": I[PermanentContract] BOX <FORM hideOnNoRecords> [ " ": TXT "No enddate (permanent contract)" ]
         ]
      ]

It is not entirely clear to me how this would work upon a create, but I'm happy to close this issue on this basis.

RieksJ commented 3 years ago

Creating contracts is a different situation, as typically you do not create a contract using a simple CRud (e.g. in between contract andBOX `. A contract is created by doing some processing, i.e. some interaction with (a) user(s), in which different attributes of the contract, or even sub-objects are created up to a point where the contract is 'clean' and can be committed to. A contract that is not being committed to yet is in a 'construction phase'. It is neither a fixed term contract, nor a permanent contract, because it isn't a contract (yet).

sjcjoosten commented 3 years ago

That's a good explanation for this particular situation, but what behavior would you expect from Ampersand if the user selects "Create Employee using EmployeeContract interface" or the likes? (Just because creating a contract using a simple CRud might not be typical, does not mean the user won't select it, or that we should forbid programmers from allowing it)

RieksJ commented 3 years ago

For any syntax r[A*B] CRUD BOX ... I would expect that the creating an atom of would make it a type B. Since your INTERFACE has multiple such structures, it depends on where the user clicks the create button.

Personally, I do not use the C for creating atoms in INTERFACEs for users any more (in APIs that's different). That is because I want my prototypes to be close to real applications, and for such purposes, creating an atom usually comes with consequences that go way beyond what C could do for me. So now I am in the habit of using BUTTONs to request for the creation of an atom, and ExecEngine rules to model the behavior of the creation process.

stefjoosten commented 3 years ago

On 13 Sep 2020 @sjcjoosten wrote:

I'm not happy with the title change, I think the way we teach students to use concepts (namely that permanent properties can be encoded as concepts with an ISA, but temporary ones - like locked/unlocked, should not) is a good way to see concepts. Michiel's example shows that union types are not about specialized concepts: he uses union types in a way to talk about what is fundamentally a transient property

Edit: I see now that I am the one who asked for this title change... I guess I unknowingly changed my mind some time between July and September.

No problem. Let's change it back again....