Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.24k stars 569 forks source link

Raft Implementation #349

Open VivekSainiEQ opened 3 years ago

VivekSainiEQ commented 3 years ago

The goal is to implement the Raft protocol for strongly consistent synchronous replication (relevant blog post here).

Any issues pertaining to situations that Raft would solve will be closed and tracked here instead.

krunkosaurus commented 3 years ago

I think RAFT is fascinating technology but I wonder if its actually useful (especially for all the work involved implementing it) for the majority of KeyDB users.

People pick Redis and KeyDB because it's so blazing fast. Nothing is faster. But when you have global applications the latency of having global reads but a central write is Redis/KeyDB's achille's heel. Network not CPU has generally always been the bottleneck. The current active-active implementation is blazing fast even across the globe. It just lacks any kind of conflict-resolution/consensus. I'm using it right now in a project, and outside of this, I like it.

Implementing a RAFT solution that requires a majority confirmation from most nodes before accepting a write seems like the slowest possible feature you could implement in one of the fastest projects in the world. Especially when the other nodes are positioned all around the world. It really only helps with disaster recovery.

People who need fast DB's globally with active-active are happy with the cost of eventual consistency if it means blazing fast activity in every single location. Every datacenter is the primary, its quite wonderful.

But with RAFT, a random node is master and every node seeks its acknowledgment to complete writes. It defeats the purpose of why most people seek master-master setups.

I understand implementing CRDT is tremendously hard. But I rather it just be canceled than for you to spend your very valuable time working on a feature that most may probably not use. It might be useful in a low-latency environment like 3-nodes in California but it's unlikely to have any global use case. And applications are only becoming more and more global. The next TikTok or Pokemon Go could be powered by KeyDB but not with RAFT implementation. All speed is lost.

Thank you for letting me express my $0.02. KeyDB is wild and amazing 🙏

JohnSully commented 3 years ago

Hi @krunkosaurus

I agree RAFT won't be the right choice for every user, but I think there are some major scenarios - especially around blpush/blpop where you really need that strong consistency. Because of KeyDB's excellent single node performance we're uniquely suited to a RAFT implementation and will be able to achieve much higher performance than other implementations. When RAFT is complete it will be an optional mode of our Active Replication feature so you don't have to pay the cost if you don't need the strong consistency.

As for CRDTs we are able to operate in that way with SETS/GETS via active replication and we will need to invest in enabling this for the more complicated datatypes. We're already seeing a lot of use there and it is definitely something we will be looking at.

I'd also mention a few things about KeyDB itself. This was started in 2019 with the help of friends although I was the only major developer for a long time. We've been able to create a team around KeyDB and though we've had some growing pains I'm really excited about being able to take on some of these larger challenges.

Hope that helps explain our thinking :)

-John

krunkosaurus commented 3 years ago

Thanks @JohnSully ! You are the thankless hero we surely don't deserve! I am aware you have been largely a one-man operation for a long time much like Redis itself. Appreciate the work you do.

I agree there are practical use cases for RAFT. Any application that exists only regionally would benefit from it and the added throughput and DR.

I am happy that the CRDT work may possibly continue afterwards! Redis Enterprise is a beast to setup manually (and that's probably intentional.)

yegors commented 3 years ago

I'll chime in here. We currently use Redis Enterprise solely for "stable" active-active replication and CRDTs for our 10 POP geo-distributed cluster. It does function, however in a higher latency WAN environments RS is effectively beta software (that costs $100k+/year). In our <1 year of usage we discovered several critical bugs (one of them is still not patched until Oct release) that causes memory to grow uncontrollably after a node drops out of a cluster because of network issues. This requires manual intervention. Redis RS also struggles with high latency, POP in Australia gets decoupled from Brazil quite often, due to sporadic high latency between them.

If stable CRDTs can be implemented without running the behemoth stack that is required to run Redis RS (3 nodes in each POP of the geo-cluster), we would be more than happy to pay the same price for this to KeyDB (although cheaper prices would certainly be welcome).

@JohnSully I'm not sure how valuable this is from your perspective, but have a look at the Hatchicorp's Serf project. We currently use this in our stack as a "out of band" instrumentation/metadata channel for passing data and triggering events on geo-distributed POPs. Back when we were evaluating KeyDB for our purposes, we've built out a management system to instrument KeyDB replication and recover from "bad states" using application code that used Serf as as a transport. You also get handy health checks out of it, "for free", just by virtue of the gossip protocol. It was too brittle being in the application layer (and we only spent a couple of weeks on it), but if it was part of the database application itself, it could be useful. Dynamic cluster membership, ability to emit events and queries with the option to trigger methods on remote machines is quite handy. This is even if you don't have direct connectivity between all nodes.

Perhaps leveraging the Serf's gossip protocol alongside KeyDB can help with cluster initialization and out of band management to help it self-heal and arrive at "good state" when failures occur.

rnz commented 2 years ago

@JohnSully May be you save your time if will use this implementation: https://github.com/eBay/NuRaft

hendrikheil commented 1 year ago

Hey there! Really loving KeyDB! Is there any news on this?