AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

Semantics of INCLUDE (this has to do with namespaces) #1467

Open stefjoosten opened 4 months ago

stefjoosten commented 4 months ago

Problem

Namespaces require semantics that will prepare us to work with distributed systems and allow us to do data migrations. So far, we have generated information systems with one unified namespace. The semantics of the INCLUDE statement until Ampersand vs. 5.0 is the set union. To support data migration, we need to support three systems, one of which has an INCLUDE relation with the two others.

Requirements

Proposed solution

In issue #850 we decided to borrow Haskell's module mechanism, with one file for each module. Each file starts with a MODULE statement, so let's replace the CONTEXT statement from Ampersand with the MODULE statement. Without any INCLUDE statements, Ampersand compiles the entire file into one information system containing a dataset, a schema, and a set of interfaces. So it compiles a module called ${\tt bar}$ to a triple $\langle D{\tt bar}, S{\tt bar}, F_{\tt bar}\rangle$. With an INCLUDE statement, we need to define that every identifier in the included module is known in the including module by the prefix " ${\tt bar.}$ ". To define renaming, need an operator $\downarrow$, just for defining the semantics in the compiler: ${\tt x\downarrow y\ =\ x<>}$ "." ${\tt<>y}$ I will overload this operator to work for information systems, datasets, schemas, interface sets, and their constituent elements as well, meaning that $x\downarrow y$ prefixes the name $x$ together with a dot to every identifier in the namespace of $y$. For example, if $y$ contains the name client, then $x\downarrow y$ contains the name x.client on every qualifying occurrence of client in $y$.

Let ${\tt foo}$ and ${\tt bar}$ be information systems. Each has a dataset, a schema, and some (0...) interfaces. Let $D{\tt foo}$ and $D{\tt bar}$ be datasets. Let $S{\tt foo}$ and $S{\tt bar}$ be schemas. Let $F{\tt foo}$ and $F{\tt bar}$ be sets of interfaces. Now we can define the system ${\tt foo\ INCLUDES\ bar}$ as:

$D{\tt foo\ INCLUDES\ bar}\ =\ D{\tt foo}\cup {\tt bar}\downarrow D_{\tt bar}$

$S{\tt foo\ INCLUDES\ bar}\ =\ S{\tt foo}\cup {\tt bar}\downarrow S_{\tt bar}$

$F{\tt foo\ INCLUDES\ bar}\ =\ F{\tt foo}\cup {\tt bar}\downarrow F_{\tt bar}$

For the datasets, this means that all relation names and concept names in ${\tt bar}$ are prefixed with ${\tt bar}$. Atoms are left alone. In the schema of ${\tt bar}$, all rule names, relation names, concept names, pattern names, and view names are prefixed with ${\tt bar}$. All rule names, relation names, concept names, and interface names from $F_{\tt bar}$are prefixed with ${\tt bar}$.

Surely, name clashes can occur. If, for example, system ${\tt foo}$ contains a name bar.account and ${\tt bar}$ contains a name account, the system $D_{\tt foo\ INCLUDES\ bar}$ has a name clash. We will forbid that to ensure a disjoint union semantics.

Alias

In the current implementation, two relation declarations with the same name, source, and target are treated as the same. I don't mind this to remain, but it does not work across the INCLUDE mechanism (because we forbid name clashes). I propose to do this explicitly with an ALIAS statement, for example:

ALIAS client, bar.client

This statement presumes that aliases have the same type, or else we get type errors. Needless to say, the ALIAS statement can also work inside one namespace. It is not linked to the INCLUDE mechanism. Aliasing works for concepts and relations, but not for other named entities.

Consequences

This mechanism excludes cyclic INCLUDE-dependencies. I expect the proposed mechanism to meet the requirements of the migration mechanism, but I will leave that to @sjcjoosten to verify. I hope that this include-relation between information systems is transitive. If not, I would like to fix that, so we can draw an include-graph of the system.

If module ${\tt foo}$ includes module ${\tt bar}$, we currently implement both ${\tt foo}$ and ${\tt bar}$ on the same database. For distributed systems, we will have to allow them to be implemented on different databases. I suggest we do that in another issue.

hanjoosten commented 4 months ago

I don't get this. What problem is there to be solved?

stefjoosten commented 4 months ago

The problem is that we have no agreed-upon semantics of the namespace stuff. So how are we going to build it first-time-right?