agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
895 stars 48 forks source link

Anti join / Existence quantifier? #118

Closed zanzix closed 7 years ago

zanzix commented 7 years ago

I can't seem to find any way to perform an anti-join (ie given [0,1,2,3] and [0,1] to get [2,3]).

I haven't seen quantifiers mentioned anywhere in the tutorial either, which I guess would be necessary for defining an anti-join?

agentm commented 7 years ago

Difference can be used to create an anti-join. In your simpler case above (which does not include any non-join-condition attributes), in tutd:

TutorialD (master/main): x:=relation{tuple{i 0},tuple{i 1},tuple{i 2},tuple{i 3}}
TutorialD (master/main): y:=relation{tuple{i 0},tuple{i 1}}
TutorialD (master/main): :showexpr x minus y
┌──────┐
│i::Int│
├──────┤
│3     │
│2     │
└──────┘

To make it an anti-join, just join the two initial relvars to the third difference relvar. However, the order of the relvar arguments does matter, so I suspect the relvar-agnostic definition to be:

TutorialD (master/main): :showexpr (x minus y) union (y minus x)
┌──────┐
│i::Int│
├──────┤
│3     │
│2     │
└──────┘

It might be worthwhile to add an anti-join operator, but the C.J. Date books don't call for it because it can be quickly built from other operators.

Could you explain more about the "existence quantifier"? I haven't seen that in the literature.

zanzix commented 7 years ago

Hi AM,

Thanks for the quick reply, that was very helpful!

I saw 'difference' in Haddock so I knew how to do it in Haskell, but I couldn't find it's equivalent in Tutd anywhere in the docs. Is there a glossary of all the operators implemented in tutd that I could use as a reference?

I agree that it's easy enough to construct an anti-join using minus. Your comment did make me wonder whether it's possible to define my own operators in tutd? I can see that there are features to save Haskell functions in tutd for transforming values and the database state, but nothing that'd lets me define my own relational algebra operators in terms of simpler ones.

It'd be very handy to be able to do so, as right now I'm resorting to keeping a separate file for listing the ones that I use often.

zanzix commented 7 years ago

The existential (sorry, not 'existence') quantifier is how you'd do that expression in SQL: SELECT * FROM table1 t1 WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t1.id = t2.id)

which would translate into relational-algebra as something like:

relvar1 where ¬E (relvar2)

and assuming that relvar2 is [1,2,3] this expression would evaluate the same as

relvar1 where ( (n != 1) and (n != 2) and (n != 3) )

Essentially what we are doing is constructing a predicate logic expression in the 'where' clause, whereas from what I've seen in the docs, tutd only supports propositional logic in the where clause (and, not, or).

According to Wikipedia's page on relational algebra, however, the anti-join is defined like this:

R ▷ S = { t : t ∈ R ∧ ¬s ∈ S(Fun (t ∪ s)) }

So it was my impression that existential quantification already existed in relational algebra.

agentm commented 7 years ago

Hi @zanzix, I was going to point you to the docs, but "minus" was missing! Thanks for pointing it out. It is now added.

It is not possible to add one's own operators to the engine. By keeping them static, they are much easier to statically optimize, but I could foresee a world which would allow operators to carry sufficient metadata to allow for optimizations.

Thanks for the detailed explanation of existence. Such checks can be simulated in TutorialD quite easily, though perhaps not immediately obviously using projection to relationTrue and relationFalse:

TutorialD (master/main): x:=relation{tuple{test 1}}
TutorialD (master/main): y:=relation{tuple{val 1},tuple{val 2}}
TutorialD (master/main): :showexpr x where ((y where val=@test) {})
┌─────────┐
│test::Int│
├─────────┤
│1        │
└─────────┘

Specifically, the projection operator {}, when used without attributes, can only return a relation with no attributes and no tuples ("false") or a relation with no attributes and one tuple ("true"). Date and Darwen call these relations TABLE_DUM AND TABLE_DEE, but these names are obviously lousy because:

  1. there are no tables in the relational algebra
  2. they reference an irrelevant Western nursery rhyme
  3. the names alone don't make it clear which name refers to which relation

Calling them "true" and "false" makes sense if the one thinks of the {} operator as asking "does the relation contain one or more tuples?" This operator enables relational expressions to be used to create boolean expressions. Note in the above example that removing this operator causes a runtime error- the "where" clause genuinely only expects "true" or "false" as valid values.

You will want to grab the latest master branch because I fixed a parsing bug which otherwise prevents the above example from working.

agentm commented 7 years ago

Project:M36 does offer existential quantification on a per-tuple basis, so I think there is no additional work needed on this issue.

zanzix commented 7 years ago

Hey Agentm, sorry for the delay in my reply! Thanks for adding the minus documentation and fixing the bug, the new functionality was exactly what I needed.