HA/Cluster Support - Githubissues

andrewstucki commented 7 years ago

Wondering what your thoughts are on using something like https://github.com/hashicorp/raft and making this work with clustering support baked in?

mperham commented 7 years ago

It might be possible but it would be hard. I think we'd need replication support in RocksDB too.

philips commented 7 years ago

Building this on top of etcd might be a good option as etcd exposes a powerful key/value API. etcd is used by lots of software and notably is the primary datastore of Kubernetes.

I think your intuition is right that building on top of a raw raft implementation would be really hard @mperham.

If just getting everything into one binary is a goal etcd even enables embedding.

Congrats on the initial release of your project.

philips commented 7 years ago

Oh, and I think you are using a queue interface. There is a etcd package to support queues

matti commented 7 years ago

Wait, what? What's the thinking behind having RocksDB if it doesn't have replication? Maybe I don't understand everything here, but wouldn't clustering be super important to have in a system like this?

andrewstucki commented 7 years ago

@matti: I believe that RocksDB is just used as a persistent store that's fast/embeddable here.

That being said, while I agree that this is quite a bit of work and, depending on clustering modes, would have an effect on R/W performance (assuming we're using consistent reads) due to additional network requests, it shouldn't actually require underlying support in RocksDB.

The idea would be that you would use the Raft protocol to ensure writes/reads to the leader node are replicated and agreed on by a quorum prior to responding to the underlying Faktory protocol request. It's the same sort of deal that other projects like rqlite have done.

Definitely not a first-iteration sort of thing, but it might be nice to keep in mind.

mperham commented 7 years ago

A couple of notes here:

Queues are normally empty. It's normal for Sidekiq's redis data to be 10MB or less.
There is some amount of long-term data, retries and schedules, e.g.
Having an hourly off-host backup with restore upon disaster is a simple scheme.

I don't want to wave away people's concerns; the alternative of building replication and clustering would make Faktory much more complex. I'm a big fan of reliability through simplicity, rather than layers of distributed systems dependencies, but I'm open to reliability improvements that are worth the cost to pull in.

mperham commented 7 years ago

@philips Faktory can run about 5000 jobs/sec right now on my laptop. I don't know how fast etcd is. The core concern is that if I use something like etcd, I'm limited to the operations they allow. To build a full-featured system, I need to be able to add more complex operations and features. Here's Faktory's low-level Queue interface today:

https://github.com/contribsys/faktory/blob/master/storage/types.go#L42

philips commented 7 years ago

etcd should be able to easily handle that workload

https://github.com/coreos/etcd/blob/master/Documentation/benchmarks/etcd-3-demo-benchmarks.md

https://github.com/coreos/etcd/blob/master/Documentation/benchmarks/etcd-storage-memory-benchmark.md

On Wed, Oct 25, 2017, 9:44 AM Mike Perham notifications@github.com wrote:

@philips https://github.com/philips Faktory can run about 5000 jobs/sec right now on my laptop. I don't know how fast etcd is. The core concern is that if I use something like etcd, I'm limited to the operations they allow. To build a full-featured system, I need to be able to add more complex operations and features. Here's Faktory's low-level Queue interface today:

https://github.com/contribsys/faktory/blob/master/storage/types.go#L42

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/21#issuecomment-339394081, or mute the thread https://github.com/notifications/unsubscribe-auth/AACDCOx8i4oXx4ZgUrqyA7ebxovqQdeMks5sv2VvgaJpZM4QFWKc .

-- CTO, CoreOS, Inc Tectonic is enterprise Kubernetes https://coreos.com/tectonic

matti commented 7 years ago

@mperham

Queues are normally empty. It's normal for Sidekiq's redis data to be 10MB or less.

I don’t know where your normal comes from, but my normal is couple hundred MBs and few hours of jobs - redis is naturally persisted.

andrewstucki commented 7 years ago

@matti - Let's try and keep this civil.

Also clustering and persistence are two very, very different things. Redis' clustering story has been around for a relatively short period (preceded by years of flaky Redis Sentinel network partitioning debates). In fact, vanilla Redis has no clustering :).

There are plenty of ways to tolerate failure in a non-distributed server architecture and really the main benefit I see in implementing consensus (as I mentioned, with something like raft) for a queue is making implementation of a reliably queueing client easier.

I opened this issue not to have people start ranting, I just wanted to see what was on the horizon for clustering support.

Also worth mentioning that, as specified above, its usage of rocksdb makes Faktory persistent, just like default Redis :).

callumj commented 6 years ago

Would a simple way of replication be using Faktory itself? All operations on the primary are duped to a special replica queue that a replica could read off and replay onto itself?

mperham commented 6 years ago

I don't want to leave this issue open forever but feel free to discuss in the chatroom any time. There are various layers on top of rocksdb which add replication, for instance Cyclone:

https://arxiv.org/pdf/1711.06964.pdf

but nothing pre-packaged or well supported that we can easily leverage today. I will keep an eye open for improvements in this space and further suggestions are always welcome.

contribsys / faktory

HA/Cluster Support #21