derekkraan / delta_crdt_ex

Use DeltaCrdt to build distributed applications in Elixir
MIT License
493 stars 36 forks source link

Eliminated the Map from MerkleMap #47

Open balena opened 4 years ago

balena commented 4 years ago

Instead of using MerkleMap, the DeltaCrdt implementation can use just the MerkleMap.MerkleTree. Advantage is a slight reduction in the processing time (~ /2) as it is not needed to insert data on the Map and on the MerkleTree. However in terms of memory usage, there isn't a big difference, as the data inserted in both structures are actually references (immutability goodies). Memory usage reduction is restricted to the "overhead" memory used by the Map. More details in the Erlang's Efficiency Guide.

derekkraan commented 4 years ago

Hi @balena,

Thanks for the PR! Can you let me know when you are done adding commits? Then I can review :D

Also, if you could check the Issues list and link any bugs you think might be solved by these changes, that would also be very much appreciated.

Cheers, Derek

balena commented 4 years ago

Hey @derekkraan,

Sorry for adding more changes to the PR. I would have created a separated branch instead of using master. But the two problems I am attempting to solve with the PR are:

I'm using DeltaCrdt in a project that distributes call states and metrics among a distributed (global) network. The cluster consists of ~20 instances.

In order to tackle the first issue, I've studied the internal data representation and noticed that you're duplicating data at the Merkle trie level, there in the leafs, storing keys and values which key hash collided, and also storing the same values in the crdt_state.value map. DeltaCrdt users will also store the same key/value again on their premises -- my project uses ETS as well as Horde. So eventually we get data stored in 3 different places. When the value occupies just a few bytes, and you don't have a big volume of data, it's not a problem.

But when you have long key/values, or the amount of tuples gets long, say 100,000 entries having each tuple around 300 bytes, with synchronization interval set to 4.5s, ~30-40% of the tuples get changed in this interval, things start to get more serious. Erlang will avoid duplicating in-memory data as long as your 3 storages are kept the same. However, mutations cause memory consumption spikes.

Moreover, some DeltaCrdt processes became totally freezed in :suspended state. I've inspected the message queue, and it was constantly increasing. The exact location is when the DeltaCrdt process has to send diffs (Merkle tree) back to neighbours; I believe that something might be happening at the Erlang network level, probably the network buffer got full, and that suspends the DeltaCrdt process. The best I could find to solve this problem was to use the [:nosuspend] option to erlang:send (Process.send/2), and retry a given number of times.

You said to link issues to the PR, but I didn't find someone reporting the above issue, so I'll end up creating them here. It's not related to Horde, it's specific to DeltaCrdt.

derekkraan commented 4 years ago

@balena I have two questions:

  1. if we are already specifying :nosuspend, should we also specify :noconnect?
  2. instead of adding a retry mechanism, why don't we simply give up and wait for the next sync interval?
derekkraan commented 3 years ago

Hi @balena I was wondering if you were still interested in pursuing this PR.

derekkraan commented 3 years ago

I think it would be easier for me to evaluate and merge changes if we could split this PR up into smaller pieces.

balena commented 3 years ago

Hey @derekkraan,

Yes I can split.

Sorry for the continued work on the code. I’m currently not using Horde anymore after a lot of trouble with it in an unreliable, high traffic, varying latency network.

I’ve faced high memory usage, processes getting stuck (that’s the “nosuspend” change), duplicated processes (though using Horde.Registry) etc. However in practice none of these alternatives fixed the problems.

Eventually I’ve wrote a thin wrapper around the new Erlang pg that works just fine for my case: single processes along the cluster, redistributable.

Let me know which parts you think relevant and I’ll split.

Regards, Guilherme.