agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

Fast three way join algorithm #265

Open jmatsushita opened 3 years ago

jmatsushita commented 3 years ago

Hi there, I just watched this talk https://youtu.be/FudEqfutW2A?t=1888 from MSFP by @henglein and @mkm about how to use module theory to get a very fast implementation of three way joins which probably has more implications with regards to database performance (with a haskell implementation which I think isn't public yet).

In the q&a section there are questions about backporting this approach to RDMS, and generally make this result more visible to the database community, and I thought this might be an idea that would be a great fit for project-m36.

Hope this is interesting 😄

agentm commented 3 years ago

Interesting, indeed! Thanks for the link. I have contacted the authors of the paper for more details.

agentm commented 3 years ago

The paper can be found here.

The paper also mentions "Factorized Databases", a tree-based representations of joins which purports to provide an redundancy-free representation.

agentm commented 3 years ago

Project:M36 already has a notion similar to factorized joins in nested relations. It may be interesting to implement a special form of join which generates nested relation results.

agentm commented 3 years ago

After reviewing the paper and the video, it's not immediately clear but strongly implied that this optimization applies only to the "triangle join" query which finds triples that reference each other circularly. The video and paper also do not conclude whether this optimization applies to n-ary loop searches.

I'm still waiting to hear back from the authors, if only to receive the haskell code used in the benchmark.

jmatsushita commented 3 years ago

What I understand is that the triangle join is the only experimental result so far, but that its possible that the intermediary representation would optimise other type of queries. I suspect that it hinges on the representation of negative intermediary values which allow terms to be eliminated without having to compute them, so I conjecture that it would apply to n-ary loops, though like you I havent seen it in the paper (which you will have noted is an extended abstract, so I imagine the authors are still working on it).

mkm commented 3 years ago

It's great to see that our work is generating interest! As you have already surmised, it is indeed a work in progress, but I hope to have a publicly available prototype Soonâ„¢.

The current epistemic status is as follows:

Note that it's not just a join algorithm, it's a query evaluation approach that happens to deal with join situations (whether explicit joins or cartesian products followed by filtering) efficiently. Of course, you can use it as a special-purpose method to deal with, say, three-way joins, but you won't reap the full benefits and the resulting query evaluator will probably still have some blind spots. Exactly how much of the module-theoretic approach can be grafted onto a relational algebra model remains to be seen, so I'm curious to see how far this goes. Depending on which flavour of relational algebra you are using, you might actually be working with semi-modules already!

henglein commented 3 years ago

There is prior workhttps://link.springer.com/article/10.1007/s10990-011-9078-8 using symbolic products and discriminatory joins, predating the module/(tensor) algebra generalization presented by Mikkel and not covering efficient cyclic joins. As it has with full Haskell code (in the article) for all standard SQL query constructs, generalized to non-normal form, I hope it can be of interest. The PDF of the article is included FYC.

Best, Fritz

On 7 Sep 2020, at 14:26, Mikkel Kragh Mathiesen notifications@github.com<mailto:notifications@github.com> wrote:

It's great to see that our work is generating interest! As you have already surmised, it is indeed a work in progress, but I hope to have a publicly available prototype Soonâ„¢.

The current epistemic status is as follows:

Note that it's not just a join algorithm, it's a query evaluation approach that happens to deal with join situations (whether explicit joins or cartesian products followed by filtering) efficiently. Of course, you can use it as a special-purpose method to deal with, say, three-way joins, but you won't reap the full benefits and the resulting query evaluator will probably still have some blind spots. Exactly how much of the module-theoretic approach can be grafted onto a relational algebra model remains to be seen, so I'm curious to see how far this goes. Depending on which flavour of relational algebra you are using, you might actually be working with semi-modules already!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fagentm%2Fproject-m36%2Fissues%2F265%23issuecomment-688291720&data=02%7C01%7Chenglein%40di.ku.dk%7C3b21e625dfa24e1efa4908d853292f12%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637350783638935588&sdata=vqtffTG783hc1UwT1wraZUpE%2FBXMJ7TTJytorJw0HoE%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAFDZD54M2PT2ULKIUKBDDLSETGNRANCNFSM4QZCBPXA&data=02%7C01%7Chenglein%40di.ku.dk%7C3b21e625dfa24e1efa4908d853292f12%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637350783638935588&sdata=v6PZG8APja6UfrXtH4tflV%2FEuFUv7nQ4VPZSmJOtqJ8%3D&reserved=0.

mkm commented 3 years ago

I now have a prototype publicly available at https://github.com/mkm/module-theory. There are examples of three-way joins as well as more complicated joins.