PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.98k stars 218 forks source link

Consider Datalog-like logic variable based JOINs #716

Open justjake opened 2 years ago

justjake commented 2 years ago

I don't have the time to write a detailed proposal at the moment, but I think you should seriously consider added Datalog-like logic variables as an alternative syntax for JOIN. The datalog style can present a substantial simplification of graph traversal use-case compared to SQL.

There's an example of a traversal in Datalog, versus the equivalent SQL (SQLite dialect):

image

(Generated by my toy Datalog to SQL compiler [github])

Related concepts on the subject:

max-sixty commented 2 years ago

Thanks for the issue! I'm a big fan of Datalog and have followed logica for a while.

How would you see this working for simple joins? Do you think it's possible to design something that is familiar enough for users with only relational experience? What would a design without access to WITH RECURSIVE look like?

My sense is that analytical tables are increasingly de-normalized, and so highly complex joins ("find 1st cousins given parent-child relationships") are less important than making more standard joins easy, and allowing for more complex join conditions ("join orders & addresses, without exploding on duplicate addresses").

I'll check out Percival — that looks really cool. I just saw the page was live — awesome!

aljazerzen commented 2 years ago

I don't know Datalog, but this seems quite concise way of expressing joins?

@justjake Could you help me understand by expressing your example about in a form of a function that takes a table an input?

In pseudo-python:


edge = { 'a': [...], 'b': [...] }

def join_path(table):
   ?

join_path(edge)

We need this, because ultimately, our join must be defined as function that is applied to a whole table.