timeout in DeltaCrdt.mutate/3

evan-gordon commented 5 years ago

We recently began using delta_crdt (version 0.5.1 currently) in our project, and while typically it seems to work fine, under higher mutate loads everything starts timing out. For our app this typically happens on startup, and I've seen this behavior during load tests as well.

Here's an example of what we're seeing:


  | GenStateMachine {MyApp.HordeRegistry, {MyApp.Automation.TaskRunner, 21393}}
 terminating ** (exit) exited in: GenServer.call(MyApp.Data.Store, {:operation, {:add, [:my_key, 1]}}, 5000)    
 ** (EXIT) time out    
 (elixir) lib/gen_server.ex:1009: GenServer.call/3     
(my_app) lib/routing.ex:31: anonymous fn/2 in Routing.lodge_data/1     
(elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3     
(my_app) lib/routing.ex:30: Routing.lodge_data/1    
(my_app) lib/my_app/data/task_runner.ex:128: MyApp.Automation.TaskRunner.handle_event/4     
(stdlib) gen_statem.erl:1147: :gen_statem.loop_state_callback/11    
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
-- | --

(i changed some of the names and data for privacy concerns)

The issue itself could just be something documentation related or might require some additional code.

We are using the default of 50 milliseconds for the sync_interval, and the default timeout for mutate/3. I was also digging through the horde code and noticed you use a sync_interval of 300 in that project.

Is raising our sync_interval to be more on par with horde something that could reduce mutate timeouts, or is there a better solution here? If that's the case perhaps some more information about how to use the sync_timeout in the docs could be warranted.

I'm also wondering if perhaps adding a batch add functionality to the AWLWWMap would be something that could be used to mitigate the load spikes we're putting on the delta_crdt?

derekkraan commented 5 years ago

Hi @evan-gordon,

First of all, I would recommend upgrading to the latest version. There has been a lot of work done to optimize delta_crdt between 0.5.1 and 0.5.9.

Secondly, I would recommend taking a look at the option max_sync_size. What's happening here probably is that delta_crdt is getting overwhelmed with updates. Setting max_sync_size allows you to trade availability for consistency (ie, it simply syncs changes more slowly when it is being hammered by writes, so you have to wait longer for your data to be synchronized across your cluster). I should also note: syncing is not done chronologically, but by key order, so the key that is hashed to 0000 will always sync before the key hashed to 0010 for example. So there is some "bias" in the system in this way.

The reason the default is 300ms on Horde is because in hindsight, 50ms is actually a very tight timing, it's 20 times per second, which is incredibly fast. 300ms is still 3.33 times per second, so still very fast, but also a lot more resilient to swamping and it won't peg your CPU.

There is a change somewhere between 0.5.1 and 0.5.9 (the upgrade to MerkleMap 0.2.0) that essentially batches all mutations, deferring a lot of expensive computation to the moment it is needed.

The other thing would be to consider arranging your delta_crdt in a different topology. I am assuming you have a mesh right now (every node connected to every other node), which is efficient in terms of latency (how long it takes an update to make it to every other node in the cluster), but inefficient in terms of throughput (which you indicate is limiting you currently). You could try a tree, partial mesh, or something else.

Lastly, I am curious to know a little more about your use case. How many nodes are you running in your cluster? What is your mutation rate? How many keys do you store in delta_crdt?

evan-gordon commented 5 years ago

Oh, I'll try upgrading the version, thanks for the warning about the performance boost!

Im not sure in my use case max_sync_size is the right approach here. I need more of a reliable way of making sure all nodes have been updated before moving on. I'm using the delta_crdt to inform each node about kafka events they should look out for, and to some extent i need to be able to wait until i know for a fact all nodes are looking for that event before continuing on with execution.

Yeah, i am using a mesh right now. I have thought of using a hash ring in the future but i was pressed for time and had to get a fix in over the weekend (such is life).

Currently we're running 6 nodes with the intention to scale it up a bit in the near-ish future. I don't have any metrics on our mutation rate. We've got roughly 400 keys currently. I'm also looking into some options that could trim that number down a bit.

Thanks for the help :)

derekkraan commented 5 years ago

I need more of a reliable way of making sure all nodes have been updated before moving on. I'm using the delta_crdt to inform each node about kafka events they should look out for, and to some extent i need to be able to wait until i know for a fact all nodes are looking for that event before continuing on with execution.

I feel the need to mention that delta_crdt might not be suitable for your needs then. Delta_crdt is by nature "eventually consistent", so there is really no guarantee (and no way to know) of when a particular mutation has been replicated across the cluster. In theory it'll happen very quickly, but there is no guarantee happening anywhere.

If you need guarantees, then you probably want something that is immediately consistent. Unfortunately I'm not very up to date on what the options are regarding library possibilities. Perhaps https://github.com/toniqsystems/raft ?

derekkraan commented 5 years ago

In any case, I'm going to close this issue since there doesn't seem to be anything actionable in here for me. Feel free to ask any other questions you might have though, I'll watch the issue and continue answering them to the best of my ability.

evan-gordon commented 5 years ago

I feel the need to mention that delta_crdt might not be suitable for your needs then. Delta_crdt is by nature "eventually consistent", so there is really no guarantee (and no way to know) of when a particular mutation has been replicated across the cluster. In theory it'll happen very quickly, but there is no guarantee happening anywhere.

Ah you're probably right there. In this situation I do need guarantees.

Thanks for your help on this!

derekkraan / delta_crdt_ex

timeout in DeltaCrdt.mutate/3 #42