Closed andrewstucki closed 6 years ago
It might be possible but it would be hard. I think we'd need replication support in RocksDB too.
Building this on top of etcd might be a good option as etcd exposes a powerful key/value API. etcd is used by lots of software and notably is the primary datastore of Kubernetes.
I think your intuition is right that building on top of a raw raft implementation would be really hard @mperham.
If just getting everything into one binary is a goal etcd even enables embedding.
Congrats on the initial release of your project.
Oh, and I think you are using a queue interface. There is a etcd package to support queues
Wait, what? What's the thinking behind having RocksDB if it doesn't have replication? Maybe I don't understand everything here, but wouldn't clustering be super important to have in a system like this?
@matti: I believe that RocksDB is just used as a persistent store that's fast/embeddable here.
That being said, while I agree that this is quite a bit of work and, depending on clustering modes, would have an effect on R/W performance (assuming we're using consistent reads) due to additional network requests, it shouldn't actually require underlying support in RocksDB.
The idea would be that you would use the Raft protocol to ensure writes/reads to the leader node are replicated and agreed on by a quorum prior to responding to the underlying Faktory protocol request. It's the same sort of deal that other projects like rqlite have done.
Definitely not a first-iteration sort of thing, but it might be nice to keep in mind.
A couple of notes here:
I don't want to wave away people's concerns; the alternative of building replication and clustering would make Faktory much more complex. I'm a big fan of reliability through simplicity, rather than layers of distributed systems dependencies, but I'm open to reliability improvements that are worth the cost to pull in.
@philips Faktory can run about 5000 jobs/sec right now on my laptop. I don't know how fast etcd is. The core concern is that if I use something like etcd, I'm limited to the operations they allow. To build a full-featured system, I need to be able to add more complex operations and features. Here's Faktory's low-level Queue interface today:
https://github.com/contribsys/faktory/blob/master/storage/types.go#L42
etcd should be able to easily handle that workload
https://github.com/coreos/etcd/blob/master/Documentation/benchmarks/etcd-3-demo-benchmarks.md
https://github.com/coreos/etcd/blob/master/Documentation/benchmarks/etcd-storage-memory-benchmark.md
On Wed, Oct 25, 2017, 9:44 AM Mike Perham notifications@github.com wrote:
@philips https://github.com/philips Faktory can run about 5000 jobs/sec right now on my laptop. I don't know how fast etcd is. The core concern is that if I use something like etcd, I'm limited to the operations they allow. To build a full-featured system, I need to be able to add more complex operations and features. Here's Faktory's low-level Queue interface today:
https://github.com/contribsys/faktory/blob/master/storage/types.go#L42
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/21#issuecomment-339394081, or mute the thread https://github.com/notifications/unsubscribe-auth/AACDCOx8i4oXx4ZgUrqyA7ebxovqQdeMks5sv2VvgaJpZM4QFWKc .
-- CTO, CoreOS, Inc Tectonic is enterprise Kubernetes https://coreos.com/tectonic
@mperham
Queues are normally empty. It's normal for Sidekiq's redis data to be 10MB or less.
I don’t know where your normal comes from, but my normal is couple hundred MBs and few hours of jobs - redis is naturally persisted.
@matti - Let's try and keep this civil.
Also clustering and persistence are two very, very different things. Redis' clustering story has been around for a relatively short period (preceded by years of flaky Redis Sentinel network partitioning debates). In fact, vanilla Redis has no clustering :).
There are plenty of ways to tolerate failure in a non-distributed server architecture and really the main benefit I see in implementing consensus (as I mentioned, with something like raft) for a queue is making implementation of a reliably queueing client easier.
I opened this issue not to have people start ranting, I just wanted to see what was on the horizon for clustering support.
Also worth mentioning that, as specified above, its usage of rocksdb makes Faktory persistent, just like default Redis :).
Would a simple way of replication be using Faktory itself? All operations on the primary are duped to a special replica queue that a replica could read off and replay onto itself?
I don't want to leave this issue open forever but feel free to discuss in the chatroom any time. There are various layers on top of rocksdb which add replication, for instance Cyclone:
https://arxiv.org/pdf/1711.06964.pdf
but nothing pre-packaged or well supported that we can easily leverage today. I will keep an eye open for improvements in this space and further suggestions are always welcome.
Wondering what your thoughts are on using something like https://github.com/hashicorp/raft and making this work with clustering support baked in?