Open ezrasingh opened 1 month ago
The first step toward achieving HA might be to address fault tolerance by implementing persistence, as the current setup doesn’t support it—meaning a crash would result in total data loss.
A possible solution could involve introducing state snapshots and an append-only log for operations. This would allow for some level of recovery during a failover, with configurable options for snapshot intervals.
My preference is for a memory-focused approach to persistence, where:
Is your feature request related to a problem? Please describe.
The current system does not support high availability clustering, which limits its resilience and scalability.
Describe the solution you'd like
Implement high availability clustering to ensure the system remains operational and performant under heavy load or in case of node failures.
Describe alternatives you've considered
Manual failover setups and load balancing, but these are less efficient and more complex than built-in clustering support.
Additional context
Features could include: