dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

fr: add support for multi-az postgres with read-only replication #133

Open dhiaayachi opened 2 weeks ago

dhiaayachi commented 2 weeks ago

Is your feature request related to a problem? Please describe.

Yes, current implementation assumes all postgres data stores have static read-write configurations (i.e. will always be read-write for the lifecycle of the temporal instance).

Describe the solution you'd like

Add support for postgres configurations such that:

Describe alternatives you've considered

No alternatives, if HA is enabled between multiple postgres servers (e.g temporal does not check pg_is_in_recovery), temporal will eventually fail upon (postgres) instance recovery / failover.

Additional context

Postgres configured such that:

Happy to provide docker configurations for reproducible context for the feature request.

dhiaayachi commented 19 hours ago

Feature Request: Dynamic Read-Only/Read-Write PostgreSQL Support in Temporal

Is your feature request related to a problem? Please describe.

The current Temporal implementation assumes that all PostgreSQL data stores have static read-write configurations, meaning they'll always remain read-write throughout the Temporal instance's lifecycle. This approach presents a challenge when utilizing High Availability (HA) PostgreSQL setups with dynamic read-only/read-write configurations.

Describe the solution you'd like

The ideal solution would involve implementing support for dynamic read-only/read-write PostgreSQL configurations, enabling the following:

  1. Dynamic HA PostgreSQL Support: Temporal instances should be able to dynamically utilize read-only or read-write PostgreSQL nodes across multiple instances, accommodating changes in PostgreSQL node roles (master/read-write, standby/read-only) during HA failover scenarios.
  2. Write Transaction Fallback: Temporal should be able to seamlessly fallback to available read-write (master) nodes for write transactions when encountering HA compatibility issues.
  3. Read-Only Node Preference (Optional): Ideally, Temporal should prioritize utilizing read-only nodes for default queries, maximizing read performance and minimizing load on the master node.

Describe alternatives you've considered

Currently, there are no viable alternatives. If HA is enabled between multiple PostgreSQL servers and Temporal doesn't check pg_is_in_recovery, it will inevitably fail upon PostgreSQL instance recovery or failover, as the cluster's state might be inconsistent.

Additional context

The PostgreSQL configuration assumes the following setup:

Reproducible Context:

Docker configurations for a reproducible environment showcasing this feature request are available upon request.

References:

This proposed feature would significantly enhance Temporal's capabilities by providing robust support for dynamic PostgreSQL HA configurations, increasing resilience and improving read performance.

dhiaayachi commented 18 hours ago

Thank you for your feature request. This is a great idea! While we don't currently support this directly, you can achieve similar results by implementing a custom failover mechanism. You could write a custom Temporal client that checks the status of the Postgres instances and routes requests to the appropriate read-only or read-write instance. This solution would require some additional code but would give you the flexibility you are looking for. We'll definitely consider adding official support for dynamic read-only/read-write configurations in the future.

dhiaayachi commented 17 hours ago

Thank you for your feature request!

The current implementation assumes all Postgres data stores have static read-write configurations. This means that a single Postgres instance is used for both reading and writing data and there is no automatic failover or replication to other instances.

While we're not currently planning on adding direct support for dynamic read-only/read-write configurations in the near future, there are some workarounds you can consider. One approach is to use a load balancer or proxy in front of your Postgres instances to distribute read and write traffic. This allows you to direct read traffic to read-only instances and write traffic to the master instance. This would require you to manage failover manually within the load balancer configuration.

We appreciate your interest in enhancing Temporal's Postgres integration, and we welcome your feedback as we consider future enhancements.

dhiaayachi commented 16 hours ago

Thank you for your feature request! We appreciate your detailed description and the docker configuration suggestion. Currently Temporal doesn't support dynamic read-only/read-write configurations for Postgres.

You could work around this by using a proxy to handle read-only/read-write traffic. For instance, you could set up a proxy that routes all read requests to your read-only nodes and all write requests to the master node. The proxy would need to be able to determine which requests are reads and which are writes.

Let us know if this workaround is suitable for your use case. We are always exploring new features and are open to suggestions.

dhiaayachi commented 1 hour ago

Thanks for the feature request! This is an interesting idea.

Unfortunately, Temporal doesn't currently have direct support for dynamic read-only/read-write PostgreSQL configurations. The current implementation assumes a static read-write configuration for PostgreSQL.

One potential workaround would be to utilize Multi-Cluster Replication for your multi-AZ PostgreSQL setup. In this scenario, you could have a primary cluster with a read-write PostgreSQL instance and a secondary cluster with a read-only PostgreSQL instance. This would allow you to leverage Temporal's failover mechanisms and maintain data consistency across your clusters.

However, it's important to note that Multi-Cluster Replication is still considered experimental, and it might not fully address all the specific requirements of your use case.