Define the datascript integration points

datacrypt-project / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure

Eclipse Public License 1.0

1.19k stars 64 forks source link

Define the datascript integration points #10

Open dgrnbrg opened 8 years ago

dgrnbrg commented 8 years ago

In Datomic, it seems like seek-datoms is the only fundamental API that needs to be exposed to the query engine in order to calculate its answers. We need to decide what the API through which Datascript will interact with our code will be.

Once this is done, we'll be able to implement some kind of transaction & query API based on the index manager module (see #11).

dspiteself commented 8 years ago

For datascript we should implement a database. https://github.com/tonsky/datascript/blob/master/src/datascript/db.cljc#L352

(defprotocol ISearch
  (-search [data pattern]))

(defprotocol IIndexAccess
  (-datoms [db index components])
  (-seek-datoms [db index components])
  (-index-range [db attr start end]))

(defprotocol IDB
  (-schema [db])
  (-attrs-by [db property]))

are the main protocols, but they also implement a few more like IHash.

dspiteself commented 8 years ago

I have some basic questions

Should the datascript Integration be in this repository or another?
Should we have an protocol for serialization or just implement nippy on the datascript datom type?
A bigger issue is we need a different comparator for each index type. I propose index-node to be parameterized by the compare function instead of using the protocol IKeyCompare.

kovasb commented 8 years ago

Fressian is pretty good for datoms.

dgrnbrg commented 8 years ago

@danboykis I'll reply to your 3 questions:

We want to keep all the code in this one repo, because monorepos are way, way easier for development. We may split in the future once we've stabilized the features.
Does Datascript require a specific "datom" record, or is there a datom protocol? Is there a transaction protocol?
Here I disagree: we should definitely use IKeyCompare rather than parameterizing the comparison function. The reason is that the parameterized function will result in key comparisons getting polymorphic callsites, while if we implement the protocol for each index, we will be able to have monomorphic callsites, which the JIT will optimize far better. Here's a wiki article explaining inline callsite caches: https://en.wikipedia.org/wiki/Inline_caching.

@kovasb I really like Fressian, but I prefer Nippy, because it's much easier to use, and it supports flexible compression and encryption stuff.

dspiteself commented 8 years ago

@dgrnbrg As far as I can tell all that is required of datom type is that is it supports ILookup with :e :a :v :t :added as keys as well as a few more ancillary protocols.

To make IKeyCompare monomorphic you would need a different type per index type :eavt :avet etc, because the result of the compare for the same datom will be different depending on the index it is in. In our case you would need a EAVTDatom, AVETDatom etc.

This may be appropriate I am just making sure we are on the same page.

dgrnbrg commented 8 years ago

@dspiteself I agree with that--I think it should help performance and have minimal code duplication, especially since there's a lot of flexibility in how we represent things under the hood for performance.