agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

A reference for the syntax and semantics of relational algebra / tutd #119

Closed zanzix closed 6 years ago

zanzix commented 7 years ago

I'm studying term-rewriting systems (and the K framework) as a way of expressing the semantics of programming languages, and I'd like to have a go at implementing a relational algebra engine in K.

I'd be grateful if you recommend a good reference for the syntax and semantics of either relational algebra in general or TutorialD specifically that I could use?

agentm commented 7 years ago

Well, TutorialD isn't a single specification as it has evolved over the years. The Third Manifesto website should have the latest details, but it's definitely not concise. The various books authored by C.J. Date and Hugh Darwen contain more explanations and examples, which is what I use for reference, but be aware that their opinions on what should or should not be included in TutorialD has changed over time and they are not always mathematically rigorous or even self-consistent. Rel is another mostly complete implementation of TutorialD with its own set of quirks.

In general, DBMS design is still very opinion/experience-based instead of math-based. One of the goals of Project:M36 is to provide a mathematically-sensitive (if not rigorous) alternative- too many ideas have fallen by the wayside due to "inability to optimize" or "scale". This project aims to prove that approach incorrect.

I am happy to answer any questions- sorry I can't offer something definitive.

agentm commented 6 years ago

Feel free to re-open this issue for additional discussion, if you like.

zanzix commented 6 years ago

Thanks for the explanation and the links to the resources. I definitely agree with you that the desire to optimize non-mathematical systems rather than finding more elegant mathematical ones is a problem in the industry!

What is the actual algorithm for running a relational algebra expression and then either updating the database or viewing some set of values from it?

That's the tricky part for me right now. I get how the operators work "mathematically" and can do them on paper, but I'm not clear on how the implementation goes, especially in a functional language. (I'm guessing that the query gets eventually translated into lens-like setters and getters?)

agentm commented 6 years ago

The Project:M36 algorithm is currently rather simple and readable. I invite you to take a look:

The state of the database itself is represented by the DatabaseContext which represents all the state that can be altered and committed such as relation variables, functions, constraints, etc.

In front of these evaluators sit a simple optimizer which eliminates useless work (such as projecting on all attributes or restricting on the true predicate).

Using this style, database interactions could be entirely idempotent. Indeed, in early version of Project:M36, the IO monad made no appearance. However, it is now necessary to use IO to write the state to disk and communicate with remote databases.

zanzix commented 6 years ago

Ah this is great, thank you for the pointers! Very clear and concise code and I appreciate the comments.

I would suggest considering writing a blog post about your development progress. I was asking around on a language-builders slack a few days ago and very few people knew much about query languages. A few people thought that databases = logic programming, too.

There was a "would-have-been-great-if-it-happened" tutorial linked to me called "Write you a database" that intended to mimic the Write You a Scheme tutorial. I thought it was a great idea, as there's virtually nothing like this that exists outside of academic textbooks.

Especially considering that your aim is to build a db that is mathematically sound. This kind of thinking needs to be publicised more!

On 2 August 2017 at 03:16, AgentM notifications@github.com wrote:

The Project:M36 algorithm is currently rather simple and readable. I invite you to take a look:

The state of the database itself is represented by the DatabaseContext https://github.com/agentm/project-m36/blob/master/src/lib/ProjectM36/Base.hs#L278-L285 which represents all the state that can be altered and committed such as relation variables, functions, constraints, etc.

In front of these evaluators sit a simple optimizer which eliminates useless work (such as projecting on all attributes or restricting on the true predicate).

Using this style, database interactions could be entirely idempotent. Indeed, in early version of Project:M36, the IO monad made no appearance. However, it is now necessary to use IO to write the state to disk and communicate with remote databases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/agentm/project-m36/issues/119#issuecomment-319548859, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXNDXrQyoV2Rn6-K2WsJlpzgiSGC3kUks5sT9wWgaJpZM4OQ92W .