datacrypt-project / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure
Eclipse Public License 1.0
1.19k stars 64 forks source link

Make the system end-to-end async, add cljs support, and add a konserve backend #30

Closed whilo closed 7 years ago

whilo commented 7 years ago

I have now done a proper integration of konserve as another backend just using the protocols. The codebase is great fun and it is working nicely so far. This is still WIP, but I wanted to open this request to track the further integration path.

If ClojureScript is supposed to be supported, then all all functions doing IO need to introduce go-blocks, starting with the protocols for Backend and Address. This does not change the fairly pure functional character, but still it adds a significant indirection. Performancewise core.async should have no big effect compared to IO. This would also change the Outboard API, which is not backend agnostic and would have to be changed anyhow.

dgrnbrg commented 7 years ago

This is fantastic! I am going to be OOO until next week, when I'll be able to do a closer review and we can get this merged :)

In the meantime, do you think you could add a few automated tests?

Also, I would love to integrate core.async throughout the APIs. I'm very interested in hearing your thoughts and discussing this further.

whilo commented 7 years ago

Ok, great. Sure, I will provide automatic tests. I will also start to wrap the necessary parts in go-blocks and we can discuss it then.

whilo commented 7 years ago

I have lifted the go-routines to the top-level with superv.async. Tests pass and performance seems to be in the same ballpark, but we need to lift the supervisor argument to the API as well. The problem is that to close dynamically over a call graph of go-routines for error-handling, we need to pass around the supervisor lexically, because cljs does not support dynamic binding in an asynchronous context. I know that this tampers with the API and dynamic scope would be much nicer, but it just is not possible atm. and from the discussions between Brandon Bloom and David Nolen I assume that it won't be fixed.

We can use plain core.async, but IO errors will then just blow up routines locally, potentially hiding errors. I think we could use a simpler go-try without a supervisor, just rethrowing the exception, since we block on all go-routines contexts, but then we still lose the ability to abort the context from the outside (cooperative scheduling in cases of errors). Maybe the latter approach is less invasive. What is your take on error-handling?

whilo commented 7 years ago

I will skip the supervisor, so it will be plain core.async + a go-try macro to carry exceptions to the top level. Is this ok for you?

whilo commented 7 years ago

I have now ported most of the implementation, but the cljs version does not completely work yet. Having said that, the port is almost done with replacing all JVM specific bits like binary search and promises (with promise-chan). Once the cljs version works this is ready to pull from my side. Any feedback is appreciated.

dgrnbrg commented 7 years ago

Great! I'm going to do a first review--hopefully, it will be constructive and we'll both learn a lot! Besides reviewing for clarity & style, I'm going to be looking at changes to the critical path and potential performance implications. Have you used the benchmarking tool yet? It generates comparison plots and interactive Excel workbooks for any number configurations, so that we can understand precisely the performance impact of changes to the algorithm.

whilo commented 7 years ago

Ok, so far I am open for your next round of feedback now. I will do some hammocking over the konserve read-handlers and add appropriate documentation in the mean time. Once you are ok with the changes to core.cljc and messaging.cljc, I would like to make sure that we have no severe performance regressions, so I am happy to explore the performance monitoring tool with you. Unfortunately I use LibreOffice and don't have Excel, I hope this is not a big deal. I really enjoy working on your codebase.

dgrnbrg commented 7 years ago

Have you tried running lein bench and opening the resulting workbooks? Openoffice should be able to open them, although I don't know if all the charts will be work or not. There's a section in the readme that goes into how to benchmark.

whilo commented 7 years ago

Getting the property based test to run with cljs was much more involved than I thought, because of https://dev.clojure.org/jira/browse/TCHECK-128. I tried to patch test-check, but as is said in this report, it involves the whole execution engine. Using core.async in cljs in combination with Clojure libraries which assume to consume return values and not go channels is still uncharted territory often. I opted to generate the ops on the JVM, store them to a temporary namespace and run a standard cljs.test on them. Feedback is appreciated.

Other than benchmarking the JVM version and comparing to the previous implementation, I would consider all issues fixed. Or am I missing something?

whilo commented 7 years ago

And finally I have done some benchmarking and examined the sheets. Sadly the core.async implementation has a significant and sizable effect at the moment. For the standard benchmark with 100000 operations both the testing and the redis backend go from ~20μs per IOP to ~40μs on my laptop. Other than this rescaling effect the benchmarks are the same to the previous implementation: Cumulative IOPs and the shape of the curves.

Is this penalty critical for you? Maybe you have some suggestions? I have had a look at the VisualVM sampling profiler, but couldn't spot the cause yet. core.async is supposed to be highly performant, feasible even for game rendering and physics, but I am not a total expert in it. I haven't optimized yet though, so some improvement should be possible.

dgrnbrg commented 7 years ago

First of all, I think that there's still a failing test--could you get that to pass? Also, could you just point out to me what causes the cljs tests to run in CI?

So, I've built & benchmarked your branch, and I'm analyzing where the slowdown is coming from. It's definitely due to core.async, and it's a 2x-4x slowdown on in-memory backend, depending on the workload.

I've identified the problem by using the "combined" view of jvisualvm, by snapshotting test runs of the benchmarking tool. The relative performance of the primary bottlenecks from before (e.g. java.util.randomUUID and PersistentTreeMap.add are still there, but the main slowdown comes from invoking ThreadPoolExecutor.execute, which is done by the go/go-try macro. This is what submits a goroutine to the executor service.

I believe the reason this problem is amplified is that each operation requires many, many goroutines to be spawned/stepped through (since I believe each "step" of a goroutine will cause another deferred segment to be executed on the thread pool. For instance, in order to delete a value, we may need to call enqueue, core/delete, lookup-path, and resolve - 4 units of work per operation.

I don't want to hold up merging this any longer (after you address my questions about the tests), but here's something for you to think about--is there any way to avoid using go blocks in those places, and instead use transducers? There may not be, but that would probably help with speeding things up.

whilo commented 7 years ago

You can run the cljs tests with node test_node.js. Yes the test is still failing, I will try to figure out why not all keys are removed in redis. We can definitely try to reduce some channel ops, I will think about it.

dgrnbrg commented 7 years ago

Could you add the node test to the testing matrix? Just edit .travis.yml and define a new bullet with the TEST_CMD that will compile & test the cljs code--that'll have it automatically run in CI.

whilo commented 7 years ago

Btw. the datascript node test runner used the return code of the process to signal failure, I guess this is necessary for Travis, right?

dgrnbrg commented 7 years ago

Yes, that's correct. Just return nonzero to signal failure :)

whilo commented 7 years ago

The test is failing because there is a nil key which I cannot remove with carmine. Not sure yet why this is the case.

whilo commented 7 years ago

edit: This is not true as we put kv-pairs on the iterator channel, so forget what I just said. It is important to keep in mind with core.async though.

One thing to notice is that the iterators will fail when you put nil as a value into the tree. This happens because core.async uses nil as a special value to also signal a closed channel (and hence the end of the process). I am not sure whether I should introduce a special value for nil or whether there is a more elegant way. Do you have any ideas on this?

whilo commented 7 years ago

Besides the redis nil problem, that I don't understand, everything should be fixed now. I have also thought about avoiding some go-routine roundtrips to the threadpool, but haven't found a solution yet.

whilo commented 7 years ago

I have closed roughly half of the performance gap by sparing the resolve go-routine round-trip (with a macro <?resolve) for nodes that are in memory. I have problems profiling it still, but removing the error propagation in the go-try and <? routines also reduces the runtime. We cannot use dynamic binding with core.async in cljs though, so error-handling would need some runtime-wide hook that you have to set if we do not propagate the errors back up the stack. Do you have any ideas on this? I don't think transducers can help, as they cannot run asynchronously, but I might have missed some trick.

whilo commented 7 years ago

I have found an issue with string support and have fixed it with a generalization of the max function, but weirdly one of the simple tests with string instead of integer keys is still failing. This test is new, so I would suggest to still close the PR and open an issue with it, if you are fine with the other changes.

whilo commented 7 years ago

So now we only have to decide how to do the error handling. We could also pass in an error call back lexically, but this will pollute the API. For example (go-try err-cb (<? some-ch)). Since https://dev.clojure.org/jira/browse/CLJS-1634 was recently closed, I am out of ideas for better error handling with core.async. In general I think the performance benefits of avoiding the rethrow logic is worth it, but if we do not abort the callstack on exceptions, go-blocks will return nil in error cases, possibly leading to corruption and unintended behaviour/follow-up errors. Opening a side channel for exceptions is not straightforward. So I would suggest to reactivate the error handling and am ready to merge this otherwise.

whilo commented 7 years ago

A largely performance neutral compromise would be to use the old implementation separately and provide the core.async part only for cljs. At least for the core and messaging functions that should reduce the overhead. I think the iterators should still be with core.async though and one could wrap the top-level with it. It will be a bit problematic due to the combination of synchronous and asynchronous IO though. A separate threadpool might be necessary to dispatch the synchronous hitchhiker-tree functions to. Once we get this done, I would try to port the GC to the async interface. What do you think?

dgrnbrg commented 7 years ago

I would rather suffer a performance penalty across the board to have a unified codebase.

So, can you just summarize to me what the error handling model that's implemented if we merge now is? I think this is ready to go, if you agree as well! This is mega-exciting!

whilo commented 7 years ago

Ok. I have reactivated the error handling. If you take a look at https://github.com/whilo/hitchhiker-tree/blob/34a73cf18f243ad4660f82232a3751b45cbc3675/src/hitchhiker/tree/core.cljc#L34 you see that all go-try does is wrap code blocks with try-catch to let the go-routine yield the exception (instead of blowing up internally) and <? then rethrows them once they are taken out of the go-routine. It is fine to be merged for me. What are your plans going forward?

dgrnbrg commented 7 years ago

It's been fantastic working with you through this commit! I am really excited for the next things we do together! Thank you for your contribution.

I have another idea for a performance improvement for the error handling that I'm benchmarking now. Here's my next steps goals for the project:

Those pieces of functionality will make building new DBs on top of the engine really easy, by letter us focus on the application layer instead. Are you interested in continuing to contribute to this project? I have many more ideas, such as a hybrid columnar storage layer, caching, compression, and constant-time aggregate statistics.

dgrnbrg commented 7 years ago

For future reference: I tried improving the speed of throw-if-exception by making a type (deftype WrappedResult [^Throwable t r]), and then having throw-if-exception have type-hinted field access to check whether .-t is non-null (and needs to be thrown) or else return .-r. My hypothesis was that instance? on Exception would be slowed down because of the number of classes which satisfy Exception; however, after benchmarking, it appears the approach I attempted has the same performance. So, it's cool that the JVM optimizes instance?, but unfortunately the performance is still where it is.

whilo commented 7 years ago

We are exploring a distributed eventual consistent datalog at https://gitter.im/metasoarous/datsync at the moment. I still have to work through the bloom paper https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html. The matieralized views of this reactive datalog will need a fast index datastructure, both on JVM and on JS. I am also thinking about using the hitchhiker tree to represent the state of CRDTs in replikativ. The efficient delta that can be sent on reconnects would make state based CRDTs reasonably efficient and scalable to arbitrary size with logarithmic memory bound per instance. As long as updates happen to a monotonically increasing keyspace of the tree, the deltas would provide maximum entropy approximately. This can be done with sequential uuids. The large scale backend CRDT solutions of the syncfree project (AntidoteDB) leverage zeroMQ for the operation dispatch, I think fairly good performance could be provided by the hitchhiker tree without relying on an external message queue provider. Since this index was internal, the p2p nature of replikativ would be maintained. For replikativ I need a tracing GC, so my next steps would be to port the one of the hitchhiker tree.

A write-ahead log interface, so that each operation can be made durable faster, and so that we can call flush less often but still have complete durability

This would probably provide some speedup, but isn't the append-log nature of the tree already providing a good 1 IOP / flush in most cases?

edit: you already pulled :+1: Also should I rebase before the merge?

whilo commented 7 years ago

Regarding performance, I think that a handcrafted JVM version will always outperform a core.async one due to the dispatch. http://docs.paralleluniverse.co/pulsar/#coreasync might help as a drop in replacement, but I haven't tried it yet. It also is not compatible with using core.async at the same time I think. Your implementation at best managed 100000 writes/s (in memory) on my laptop, do you think it could be made faster while staying persistent? I think you have more experience with high performance backend solutions and JVM performance. It is not really important to me right now, I am just not sure how far one could get in todays landscape building on the hitchhiker tree.