connect database and write custom directives

bionicles commented 3 years ago

Dear Eric,

This looks extremely cool. I'm keen to get into Rust for data stuff, and I like Datomic, but dislike JVM kludge, so I found your project.

Would you be openminded to enable users to connect this to a database?

Perhaps something like GraphQL resolvers would work - just a function to fetch the data for various fields

Then we could run Datalog queries on something like RedisGraph, DynamoDB, Postgres, or use them to transform data before we save it, that sort of thing

Also, database connections ideally need some flexible way to define fine-grained authorization rules. I like to do this with functional programming inside the data schema itself, using custom GraphQL Directives

EX:

   @author (check if the user is the author of an object)
   @acl (check if the user is in the access control list) 
   @admin (check if the user has a certain role)

this could be super powerful in context of Datalog as we could use deductive logic queries to decide who can access what data. Prevents data breaches and allows multi-tenant apps while protecting users' privacy

What do you think about making something like CrepeQL, a mixture of the best of Datalog and GraphQL, to run Datalog on arbitrary databases, with fine-grained authorization, all compiled nicely with Rust?

Thanks in advance for your reply,

Bion

ekzhang commented 3 years ago

Thanks for the idea! I've also been wondering how to connect Datalog with table databases, and it seems tricky. I believe Datomic uses some kind of normalization to represent objects in narrow entity-attribute-value format (plus a timestamp), which makes it possible to query things naturally in Datalog as a kind of 4-tuple. This is a bit different from the wide format that Crepe assumes. It might require a different approach to support named (and not just ordered) columns.

I previously read this paper about building a Datalog engine on a single node that executes queries through SQL translation, which you might find interesting. Also agree that there's potential to use Datalog for authorization rules, not sure how the design would work in the case of GraphQL though. The JavaScript package manager Yarn has a Prolog validation engine, for instance.

Let me know if you have more concrete ideas about database query languages based on Datalog. Would be happy to discuss more and see if there are ideas for a potential project here.

bionicles commented 3 years ago

https://pypi.org/project/pyDatalog/ hooks up to SQL, what if we looked at that, but instead of SQL, we could just make arbitrary "resolver" type pluggable thing? python being simpler than rust, would possibly be a quick way to understand how this might be done

key concepts would be --

resolvers - a way to hook fields to the database with rust functions
dataloaders - a way to batch + cache queries to avoid spamming the DB

I'm a big fan of the book "thinking in systems" by donella meadows and the terminology of systems dynamics.

We could simply pick the right abstraction: define Source (trait), Inflow (fn), Stock (struct), Outflow (fn), Sinks (trait)

Source -> Inflow -> Stock -> Outflow -> Sink

Source implements "get" or "take" and Sink implements "set" or "send" Inflow maps from a source type onto a stock type. Could be a database, REST API, etc Outflow maps from a stock type onto a sink type. Perhaps that's a storage service or a response to a client. Stock can implement one or more query languages. Datalog is cool, as is Cypher, Gremlin, SQL would be handy, Redis commands are legit, Falcor's virtual json model is pretty cool, there's a lot of different ideas for that.

The stock could be any type we want, as long as it's easy to hook up to Crepe!

Then we could just go nuts and define a lot of different sources, sinks, flows, and reuse them, hook them together. Information System Dynamics

Another legit inspiration would be ROAPI, built on Arrow and Datafusion https://roapi.github.io/docs/index.html And the leveldb pluggable backend/frontend, is also a good inspiration, https://github.com/Level/awesome

bionicles commented 3 years ago

Anyway, perhaps that's a different project, a compute graph for dataflow, but it sure sounds fun to make :) now I just gotta learn rust!

bionicles commented 3 years ago

the pyDatalog author said in git issues this was a good version before he tried a big rewrite that didnt pan out

here's how he connected the objects in datalog to SQL

https://github.com/pcarbonn/pyDatalog/blob/5a410380fa51f5e2a944b1f12c535c0f86282fa3/pyDatalog/pyDatalog.py#L305

""" ****************** support for SQLAlchemy ***************** """

class sqlMetaMixin(metaMixin, DeclarativeMeta): 
    """ metaclass to be used with Mixin for SQLAlchemy"""
    pass

""" attach a method to SQLAlchemy class.attribute, 
    so that it can answer queries like class.attribute[X]==Y"""
def InstrumentedAttribute_getitem(self, *args):
    cls = self.class_
    method = self.key
    return type(cls).__getattr__(cls, method).__getitem__(*args)
InstrumentedAttribute.__getitem__ = InstrumentedAttribute_getitem

TLDR: he used an ORM. Might be cool to just let user supply some fetcher function instead. Then it's not dependent on some ORM, but you could certainly use one in the fetcher if desired

Where's the interface between queries and data in Crepe?

brurucy commented 3 years ago

@bionicles are you aware of https://github.com/TimelyDataflow/differential-dataflow? it allows you to express, as far as I'm aware of, anything datalog-ish, and it is trivial to connect it to a database.

https://github.com/oxigraph/rio could be used as an rdf parser in order to load facts

Couldn't const generics be used in order to generate compile-time code with respect to some specific database connection? Sorry if I'm grossly misunderstanding what's the actual use case of const generics.

infogulch commented 2 years ago

represent objects in narrow entity-attribute-value format (plus a timestamp), which makes it possible to query things naturally in Datalog as a kind of 4-tuple.

This reminds me of how TerminusDB's storage strategy. TerminusDB is an rdf-oriented graph database built with rust and prolog and uses a custom storage format. The format stores 4-tuples (5-tuples?) in columns of references into an array of values, which are compressed after running the references through two passes of delta encoding (taking the difference between adjacent references). I hope that description is sensible.

They claim 700x compression ratio for tuples using this method. This seems like a good starting point for a compressed representation of multi-tuples to which are suitable to be committed to durable storage.

https://github.com/terminusdb/terminusdb/blob/dev/docs/whitepaper/terminusdb.pdf

ekzhang / crepe

connect database and write custom directives #15