datacrypt-project / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure
Eclipse Public License 1.0
1.19k stars 64 forks source link

Define the datascript integration points #10

Open dgrnbrg opened 8 years ago

dgrnbrg commented 8 years ago

In Datomic, it seems like seek-datoms is the only fundamental API that needs to be exposed to the query engine in order to calculate its answers. We need to decide what the API through which Datascript will interact with our code will be.

Once this is done, we'll be able to implement some kind of transaction & query API based on the index manager module (see #11).

dspiteself commented 8 years ago

For datascript we should implement a database. https://github.com/tonsky/datascript/blob/master/src/datascript/db.cljc#L352

(defprotocol ISearch
  (-search [data pattern]))

(defprotocol IIndexAccess
  (-datoms [db index components])
  (-seek-datoms [db index components])
  (-index-range [db attr start end]))

(defprotocol IDB
  (-schema [db])
  (-attrs-by [db property]))

are the main protocols, but they also implement a few more like IHash.

dspiteself commented 8 years ago

I have some basic questions

kovasb commented 8 years ago

Fressian is pretty good for datoms.

dgrnbrg commented 8 years ago

@danboykis I'll reply to your 3 questions:

@kovasb I really like Fressian, but I prefer Nippy, because it's much easier to use, and it supports flexible compression and encryption stuff.

dspiteself commented 8 years ago

@dgrnbrg As far as I can tell all that is required of datom type is that is it supports ILookup with :e :a :v :t :added as keys as well as a few more ancillary protocols.

To make IKeyCompare monomorphic you would need a different type per index type :eavt :avet etc, because the result of the compare for the same datom will be different depending on the index it is in. In our case you would need a EAVTDatom, AVETDatom etc.

This may be appropriate I am just making sure we are on the same page.

dgrnbrg commented 8 years ago

@dspiteself I agree with that--I think it should help performance and have minimal code duplication, especially since there's a lot of flexibility in how we represent things under the hood for performance.