AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

Disallow I on concepts without equivalence defined. #341

Open hanjoosten opened 8 years ago

hanjoosten commented 8 years ago

With the introduction of TTYPE, we now have concepts without a proper equivalence relation defined. This already causes errors, because SQL does not allow these concepts as the primary key in a concept table. (i.e. BINARY, FLOAT)

As a consequence, certain properties must be disallowed on relations with such concepts as source or target.

@sjcjoosten, @WolframKahl, could you give your opinion about this? I have discussed this today with @stefjoosten, and we agreed to call for the help of the mathematicians 8-))

WolframKahl commented 8 years ago

This is interesting! :smiley:

It probably means that it could be useful to adopt a semigroupoid-based point-of-view replacing the current more category-based point of view: Semigroupoids are categories where it is not assumed that there is an identity relation on every object.

Shameless plug: http://dx.doi.org/10.1016/j.jlap.2007.10.008

(My RATH-Agda development fleshes this out by including also a hierarchy of semigroupoid-based theories, currently up to division semi-allegories.)

I actually have applications that deal with similar constraints, but only in a semigroupoid setting, without converse, which makes it easy for me to have morphisms with, e.g., FLOAT as target, but not consider any morphisms with FLOAT as source. (No publication yet.) I am embedding a category in that semigroupoid, so I have identity morphisms on some objects of the semigroupoid (but not on FLOAT). That category is actually equivalent to the mapping category of a relation algebra, but the two use different data structures for implementation, which is somewhat similar to your distinction between concept tables and binary tables.

A similar conceptual set-up may be appropriate for Ampersand, with this kind of types only being allowed as target types, and limited sets of operations available for relations with those target types. Unfortunately I don't have time to go any farther on this today, nor tomorrow before my evening.

I am also not yet familiar with the SQL side of this, and would appreciate any pointers.

hanjoosten commented 8 years ago

Haha, As a Humble Common Mortal I don't get this all, but that's OK. Maybe I can elaborate on what made me come up with the issue, for what it is worth. As a result of issue #318 I have been rewriting the plugSql mechanism of Ampersand. This is where the database structure is created. In short, a concept table is created for each typology (=set of concepts related in a single classification). No further grouping is done, even though in some cases that could be possible. Then, all user-defined relations that are UNI or INJ are distributed amongst these tables. The other relations are stored in a link table each. The primary key of each concept table is the upper bound of the typology (See also issue #292 about enforcing a single upper bound only).
In SQL, the type of the primary key is derived from the TTYPE of the root concept. However, for some TTYPE SQL does not allow it to be a primary key. This currently results in a broken prototype.

sjcjoosten commented 8 years ago

I agree that we may want this, but I find it hard to think of a good theoretical framework for this. Let's use BLOB as an example of something without equality, and r :: A * BLOB s :: BLOB * A[UNI]. Now is r;s allowed? One could argue yes, because s can be applied as a partial function on BLOBs. Is s even allowed? Are we able to check UNI constraints for s on the database? Do we even have to check this, or is it guaranteed by the database, and do we only check it when a new pair is inserted?

I think it will be difficult to use the type-checker for this, since the type-checker cannot predict which parts are handled by SQL, and which are checked in PHP. Similarly, it cannot predict what constructors are used after the normaliser gets its hand on it. On the other hand, it would not be very satisfying if we simply have to give an error if the resulting code requires comparisons on BLOBs, without us knowing exactly how to deal with this.

The paper on semigroupoids is interesting in that it prohibits the use of certain properties and operations. Of course, we do Ampersand to work as usual for concepts that DO have decidable equivalence (and are finite). I'll focus on equivalence, since it's probably easier (in the sense that almost nothing is allowed if equivalence is not defined).

To make this work, all reasoning would need to be restricted to use certain rules only if the concepts have equality. For instance, it would not be allowed to combine conjuncts: if r is a rule, and ~s is a rule of the same type, then "r /\ ~s" is usually an equivalent rule, but we cannot construct the /\ here since it requires equality. If 'r;s' were allowed, then '(r;s)~ = s~;r~' may even fail, since we cannot compute the latter from left-to-right (because 'r~' is not a function).

Maybe we can start with a list of allowed and not-allowed operations, rather than with a list of allowed and not-allowed derivations. This list clearly depends on the SQL generator.

The next problem is how to make sure that all the relevant reasoning parts know whether or not a certain optimisation is allowed. I do not know how to deal with this.

hanjoosten commented 8 years ago

Let's start with your example. s cannot be defined as UNI, because that would mean: s~;s |- I[BLOB], which is undefined.

hanjoosten commented 8 years ago

Also, I object to changing the discussion into a discussion of databases. I don't think that that's the issue here. In database-land, one does not allow for (primary)keys where equivalence isn't available.

hanjoosten commented 8 years ago

I would argue that in any Ampersand expression, we want to disallow ANY I[X], where there is no equivalence on X. This means, that some rules (like the UNI above) simply cannot be expressed. By doing so, we can prevent that we generate a table where a Blob or a Float is the primary key is

WolframKahl commented 8 years ago

In the following, I keep @sjcjoosten 's assumption that BLOB has no equivalence, and also assume that other concepts do have equivalence.

I'd say that r :: A * BLOB is only allowable with at least [UNI], and adding [TOT] is allowed.

Similarly, s :: BLOB * A is only allowed with at least [INJ], and [SURJ] can be added. This way, I believe we can avoid the asymmetries mentioned by @sjcjoosten .

These UNI respectively INJ constraints can only be enforced by the database structure, by making the A column the key and adding a functional dependency from the A column to the BLOB column, but not by RULEs, since the corresponding rule expressions cannot be tested.

The TOT respectively SURJ constraints can (presumably) be enforced by not allowing null values in the BLOB column.

Adding INJ and/or SURJ to r is forbidden, since it can neither be guaranteed by the database structure, nor tested by rules. Analogously, adding UNI and/or TOT to s is forbidden.

Trying to have q :: A * BLOB without [UNI] has undefined insertion of pairs with already-present keys, and deletion only possibly by key, not by key-value-pairs. I would therefore forbid this. (Therefore, I would also forbid s :: BLOB * A[UNI].)

r cannot be a rule, since the semantics of a rule is that r = V, which is not testable.

r ; s and r /\ s~ cannot be calculated, and therefore cannot occur in rules.

s ; r has a type that cannot be represented as a database table, and is illegal for that reason.

Like in semi-allegories, it makes sense to have primitives dom and ran; the expressions dom r and ran s are legal, and can occur in rules. (However, r ; r~ /\ I is not legal since r ; r~ cannot be calculated.) To me, it currently looks like this is the only way in which r and s can occur in rules, so this is actually quite important.

I think this might actually work out: Use the declaration constraints from above, and restrict expressions in the following way (stated without thinking about subtyping):

Are there any envisaged uses of BLOB that I would be forbidding here?

hanjoosten commented 8 years ago

This all sounds very reasonable. I am triggered by your: ...stated without thinking about subtyping. Does it make sense to speak of subtyping, if equivalence isn't available?

WolframKahl commented 8 years ago

Does it make sense to speak of subtyping, if equivalence isn't available?

Perhaps ─ but I am inclined to postpone thinking about it until you show me a potential use case...

stefjoosten commented 8 years ago

Let me think...

Let us say:

CLASSIFY XMLtext ISA BLOB

Assume that BLOB has no equality. This means I[BLOB] does not exist. What if I want XMLtext to have equality? I need some way to express that, which is beyond our current syntax. In many specification languages, however, it is not so weird to have that.

That was my thought for now...

hanjoosten commented 8 years ago

That doesn't fit into our current thinking. We would specify this something like:

CLASSIFY X ISA Y REPRESENT Y TYPE BLOB

Then both X and Y must be BLOB, because they are in the same typology.

@WolframKahl : I cannot think of any use case.

stefjoosten commented 8 years ago

Currently, Ampersand lives under the assumption that equality is defined for all concepts. Relaxing that assumption calls for an integral code inspection throughout the Ampersand system. Besides, the user must be given a usable language that allows them to work without that assumption. So I applaud this attempt to think it out before we do anything to the language.

stefjoosten commented 8 years ago

The semigroupoid-article is intriguing. Is it the theoretical framework that @sjcjoosten is calling for?

stefjoosten commented 8 years ago

I can see that @WolframKahl's observations can be expressed in an Ampersand-metamodel. That allows us to play with it in a way similar to what Sebastiaan and I did with the type system at RAMiCS. Actually,FormalAmpersand is a good starting point for this.

sjcjoosten commented 8 years ago

For the moment, it seems rather easy to implement dom and ran as suggested by Wolfram. Their definitions are as follows, for any expression r:

dom(r) = r ; r~ /\ I
ran(r) = dom(r~)

This is useful on its own, for things like SQL optimisation, or to make rules more readable (take for example the rule that an ActivePerson is someone who teaches now, or follows classes now: I[ActivePerson] = dom(teaches;time;'now') \/ dom(takesClass;time;'now')) . It has the additional benefit that we can use it to test thoughts on this subject (maybe by only allowing BLOB when -dev is used). Shall we make a feature-request-ticket for the implementation of dom and ran? We can discuss naming (decide whether we should use a symbol for it; if not, whether the words need to be all-caps like all other reserved words) and whether we need ran, in that one.

hanjoosten commented 5 years ago

related to issue #722