contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.73k stars 227 forks source link

Highly Available Faktory #368

Closed sloan-dog closed 3 years ago

sloan-dog commented 3 years ago

Faktory can be backed by a remote instance of redis. This prevents data loss in the case of faktory dying. However, jobs cannot be processed in the meantime, as the single instance of faktory is dead.

Multiple instances of faktory were backed by the same redis instance, configured as a quorum, to facilitate high availability. E.g. If configured in a raft and the leader dies, the remaining members of quorum can elect new leader and jobs can be processed.

mperham commented 3 years ago

There are a number of features which are not designed to handle multiple instances writing to a single Redis so that's unlikely to ever happen. And many features are written assuming one Faktory only, for instance you can't spread batch jobs or throttle queues across Faktory instances.

You can run multiple Faktorys and have the client handle network errors, e.g. one Faktory per datacenter and fail over to a backup datacenter when necessary. It's possible you could even use an HAProxy instance to automate this failover without any client work at all. But all of this extra logic and infrastructure would make the system arguably less reliable, not more. Only you can determine what's right for your organization and needs.

sloan-dog commented 3 years ago

In ole faktory gitter, it was suggested that you'd explored running multiple instances of faktory before. It sounds like that that isn't possible. Faktory Ent has run without even a blip so far, but I guess I would sleep better at night if it had a friend to take its place in case of failure, but perhaps that is best solved outside faktory itself.