datacrypt-project / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure
Eclipse Public License 1.0
1.19k stars 64 forks source link

Get rid of all global mutable state, allow clean reloading #2

Open mpenet opened 8 years ago

mpenet commented 8 years ago

I see a couple of defonce here and there, some global mutable state that should imho be encapsulated and allowed to be free'ed: Encapsulate all the state in a session like "object". A user should be able to handle many of them if necessary (think different trees in different redis dbs and/or servers too), or should be able to have control over resources (such as refcount-expiry-thread, caches, etc).

This should be an argument to some functions imho:

(let [session (create-session! {:backend {:type :redis :port 6379 :host "redis" :db 1}})
      my-outboard (ob/create session "first-outboard-tree")]
      .... do stuff 
      (shutdown-session! session))

could also be integrated with with-open (IClosable).

Some pointers on what/where:

https://github.com/dgrnbrg/hitchhiker-tree/blob/298a0660a44aa86b3bd40b5ef45f7ea35c97154b/src/hitchhiker/redis.clj#L129-L131 https://github.com/dgrnbrg/hitchhiker-tree/blob/298a0660a44aa86b3bd40b5ef45f7ea35c97154b/src/hitchhiker/outboard.clj#L28-L30

dgrnbrg commented 8 years ago

The 2 functions you found are exceptions to this rule, even in Clojure :)

The first function is referentially transparent, and I'd actually like to improve the caching strategies so that you can allocate a fixed amount of cache (in MB) to be shared among all of the process's outboards.

The list of outboards themselves needs those global state trackers for 2 reasons:

[1] Accidentally initializing 2 outboards with the same name during development would cause weird data corruption. Outboard is a usable API for the hitchhiker tree, and the connection registry eliminates a class of bugs by guaranteeing a singleton for each named connection in a referentially transparent way. (This especially matters since you are able to reconnect to an outboard after you restart your process, so that you don't need to reload your state into memory)

[2] The outboard tries to find a balance between using JVM memory & using spare IO to flush when available--the expiry tracker's global view of all outboards helps to keep resources low.

Both of these globals are designed to give a referentially transparent API similar to Datomic's, while not requiring the user to understand the IO scheduling. If you'd like more control, you can use the low-level API; otherwise, I think these functions are important for usability.

Does that make sense?

mpenet commented 8 years ago

It's actually one of my pet peeves in clojure itself (same in core.async).

I understand there are parts that are "server" specific and others on the "clients" side, but even in the case of the server part a proper lifecyle and options (backend port/host and whatnot) would be welcome. I can't see an issue with having global/default values for these options or the registry or ref thread, but allowing the user to take control over these is something fairly common. That would allow to swap implementations when/if needed, have different policies for various aspects, make testing easier etc. Onyx, pithos or cyanite projects are a good example of such design principles imho.

That said, it's an interesting project :)