Open Stogas opened 6 months ago
Good you've opened this issue @Stogas! I've been comparing solutions. Element/Matrix checks all boxes, but one: High Availability. Single Point of Failure is not an option: "one == none". Clients should continue to work if a server goes down, without a client/app being interrupted. HA is a key feature to making the infrastructure secure (availability) and experience robust. Looking forward to see progress in this area. Keep up the good work!
Description:
Currently (as of v1.102.0) Synapse supports horizontal scaling capability via workers. As I understand, the current worker capabilities result in somewhat complete request processing independence from the main worker process (for which we cannot run multiple processes), so at first glance, Synapse is highly-available.
However, using workers require the use of Redis. Synapse (again, as of v1.102.0) only supports a single Redis hostname and port. It does not support Redis Sentinel, which would handle Redis master (write-capable) election and redirection.
I've found a PR in the old matrix-org repo for adding Redis Sentinel support, but I have no ability to maintain it - so adding an issue ticket here instead.
Additionally, I think it might be worthwhile to add relevant guidelines in the Synapse documentation for information on how to achieve a highly-available setup. The 2020 post about scaling Synapse seems to indicate multiple Redis instances in their diagrams, but I can't seem to figure out how to achieve this with Synapse, as supporting Redis Cluster requires the redis client to be cluster-aware.
Related issues: