dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Unable to use dns:// prefix for clusterInformation.rpcAddress configuration #142

Open dhiaayachi opened 2 weeks ago

dhiaayachi commented 2 weeks ago

Expected Behavior

We are trying to setup multi-cluster replication and for that purpose need to configure the clusterMetadata of the Temporal configuration. This is described here. Specific to the rpcAddress the docs says:

rpcAddress - indicate the remote service address (host:port). Host can be DNS name. Use dns:/// prefix to enable round-robin between IP address for DNS name.

We are using this prefix everywhere where we create a gRPC client. In particular our workers connect via dns:///<service-address>:7233. The reason to explicitly specify the dns resolver is that grpc-go deviates from the gRPC spec and does not use the dns resolver as default. See https://github.com/grpc/grpc-go/issues/3544#issuecomment-618689886. It uses a so called "passthrough" as default. If one wants to use dns, one needs to explicitly specify the resolver using dns:///.

It seems in most cases Temporal allows the use of the prefix, except for clusterInformation.rpcAddress.

Actual Behavior

When configuring the clusterInformation.rpcAddress with something like dns:///<host>:7233 an error is returned when one tried to use cluster replication commands, for example temporal operator cluster upsert in order to establish a connection between two clusters. The returned error is:

address dns:///myhost:7233: too many colons in address

See also https://go.dev/play/p/9Rwow_gtcCF

The problem is that RPCFactory#CreateRemoteFrontendGRPCConnection uses net.SplitHostPort to extract the hostname from the configured rpcAddress in order to lookup the client config for the host. net.SplitHostPort cannot handle the dns:/// prefix.

func (d *RPCFactory) CreateRemoteFrontendGRPCConnection(rpcAddress string) *grpc.ClientConn {
    var tlsClientConfig *tls.Config
    var err error
    if d.tlsFactory != nil {
        hostname, _, err2 := net.SplitHostPort(rpcAddress)
        if err2 != nil {
            d.logger.Fatal("Invalid rpcAddress for remote cluster", tag.Error(err2))
        }
        tlsClientConfig, err = d.tlsFactory.GetRemoteClusterClientConfig(hostname)

        if err != nil {
            d.logger.Fatal("Failed to create tls config for gRPC connection", tag.Error(err))
            return nil
        }
    }

    return d.dial(rpcAddress, tlsClientConfig)
}

Steps to Reproduce the Problem

  1. Use the dns:/// prefix in the clusterInformation.rpcAddress of the Temporal server in order to enforce the dns resolver in grpc-go
  2. Use an operation which calls CreateRemoteFrontendGRPCConnection, eg try enabling a cluster connection between two clusters.

Specifications

dhiaayachi commented 3 days ago

Thanks for reporting the issue.

The dns:/// prefix is not supported in clusterMetadata.rpcAddress as Temporal uses the net.SplitHostPort function to extract the hostname from the configured rpcAddress, which cannot handle the prefix.

You can work around this limitation by using the IP address directly in clusterMetadata.rpcAddress instead of the DNS name. This will allow you to establish a connection between the two clusters.

We appreciate your feedback and will consider adding support for the dns:/// prefix in future releases.

dhiaayachi commented 3 days ago
Thanks for reporting this issue!

You're right, the `clusterMetadata.rpcAddress` currently doesn't support the `dns:///` prefix. This is a known limitation that we're working on addressing. 

As a workaround, you can use an IP address instead of a DNS name for the `rpcAddress`. This will avoid the error, but you'll need to make sure the IP address is reachable by all the clusters involved in the replication.

You can find more information on configuring multi-cluster replication in the [Temporal documentation](https://docs.temporal.io/references/configuration#clustermetadata).