datacrypt-project / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure
Eclipse Public License 1.0
1.19k stars 64 forks source link

Support multiple backends. #32

Open whilo opened 6 years ago

whilo commented 6 years ago

Hi,

I have finally ported datascript to the hitchhiker-tree, all tests pass and it works with a non-async port of the hitchhiker-tree (which considerably simplifies profiling and debugging). https://github.com/whilo/datascript/tree/hitchhiker_tree_support Query performance is ok for my small queries on the REPL (~ 3x-5x times slower than the bt-set in datascript itself), but it is WIP and still not ported to core.async. The GC of the hitchhiker-tree is also still missing. I would like to hear your opinion on these efforts, both for the tree and, if you are interested, on datascript. I still have to port the cljs tests and maybe this dynamic binding in core.cljc for *async-backend* is not optimal(?).

Best, Christian

dgrnbrg commented 6 years ago

Hey, this is really exciting! I'm sorry for not responding sooner--I've been incredibly busy with work.

Would you mind giving me a walkthrough in terms of how to understand this PR? Like, what are the concepts you've altered to make this work, what order should I read through the changes, etc.

Also, I noticed that 2 tests are failing. I would be happy to help you identify what might be the issue, if you could start by giving my a tour of this change.

I always hoped that one day the hitchhiker tree would be integrated with datascript--I'll do what I can to land these changes and support in whatever way I can.

whilo commented 6 years ago

I have basically just made the core.async macros optionally no-ops, so we would recover the old hitchhiker-tree semantics. I did this to have easier profiling, because I was trying to make the query engine of datascript perform competitively to Datomic, which worked out in the end. (I had a really stupid thought error...). Since namespaces are not parametrizable, I am not sure how to exactly make the async backend pluggable. I opted for a dynamic binding, but this needs to be bound before the hitchhiker-tree namespaces are evaluated (macro-expansion time). Do you have any suggestions on that? I would not merge before this is addressed, I just wanted to get some input from you first.

Regarding datahike, I think it might be pretty interesting to try with your redis-backend, which I have not managed to map yet. I have also thought about running a datahike instance in a BFT tolerant blockchain-like system to facilitate neutral multiparty infastructure management (e.g. supply chain, setups of datacenter instances etc.). Another thing that I would like to do is move away from my CRDT-based distributed approach to a linearizable system with datahike first and then do a relaxation on parts of the schema with conflict-resolution mechanisms (e.g. causally consistent grow-sets or last-writer-wins semantics) and stream these through factui or reactive dataflow into frontends with react-native. This is a programming model that looks fairly perfect to me.

dgrnbrg commented 6 years ago

Thank you for your guidance! If you're still interested in getting this merged (I am!), I'll be able to take a look after this week--I'm speaking at a conference this Friday & Saturday, so next weekend I'd be able to take a first pass through the review.

whilo commented 6 years ago

Hey David. I was also super busy with a paper deadline and some business related stuff. I have just updated the codebase again a bit, but I am sometimes hitting a weird error in the konserve backend in its generative test. I would love to hear your feedback on how to make the async backend pluggable and then take a few days off to clean things up a bit before a merge. If you have some cool ideas how to combine datalog with the hh-tree or nice JVM backend infrastructure, feel free to brainstorm as well. I have atm. the idea that datahike could actually allow a diverse set of DB objects on different index datastructures with different tradeoffs, while allowing composable queries over the whole infrastructure (including client-side queries). It is a bit crazy, but this is what I would love to have and the best way to compose systems I can come up with right now...

whilo commented 5 years ago

The error in the test was an edge case error in the testing code, so everything is fine. I am currently writing up an explanatory blog post about the hh-tree for our blockchain project which follows losely your presentation. If you are interested I can give it to you to read beforehand. You are also welcome to join at any point of course. At the moment we mostly coordinate through a telegram chat.