dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.44k stars 361 forks source link

Kubernetes DnsSrv service discovery across namespace #2515

Open Dona278 opened 4 months ago

Dona278 commented 4 months ago

Preamble: In kubernetes we can communicate between services according to the DNS name and (specification). With service discovery named endpoints in Aspire we can use the "http://_{endpointName}.{serviceName}" uri in order to work, awesome! So in production within a kubernetes cluster, in which we have namespaces, I thought it was normal to add the namespace after the service name, in case of communication between services of two different namespaces, as described in the kubernetes service dns concepts. Unfortunately the service discovery works only between services of the same namespace and without declaring anything after the service name. So I though it was an issue with our services configuration but after follow the debug instruction for CoreDNS I have not found any sort of issue.

Issue: HttpClient with service discovery enabled which make request to named endpoint of a service which belong to a different namespace returns always The endpoint collection contains no endpoints. After debugging service discovery implementation I found that the srv query made contains the current namespace instead of the provided one within the service name (e.g. with http://_grpc.service-name.default). The namespace is returned from the DnsSrvServiceEndPointResolverProvider.ReadQualifiedNamespaceFromResolvConf method which return the second result of the search line split by space (e.g. search test.svc.cluster.local svc.cluster.local cluster.local return test.svc.cluster.local) which is then appended to the srv query which result in http://_grpc._tcp.service-name.default.test.svc.cluster.local.

Workaround: We can use the QuerySuffix option to hard code svc.cluster.local which is then appended to the srv query instead of using the DnsSrvServiceEndPointResolverProvider.ReadQualifiedNamespaceFromResolvConf method, which result in http://_grpc._tcp.service-name.default.svc.cluster.local which works perfectly.

Suggestion; It might be good to avoid query suffix and instead take the third value of the search line split operation within DnsSrvServiceEndPointResolverProvider.ReadQualifiedNamespaceFromResolvConf in case of namespace already declared in serviceName parameter.

ReubenBond commented 4 months ago

Hi @Dona278, thanks for the writeup. Indeed, Service Discovery makes the (over-)simplifying assumption that all of the services for your app live within a single namespace. One issue with adding the namespace to the service name is that it won't work at dev time out-of-the-box. That is my concern. Do you have a suggestion for how this could work in a way which is portable across dev & prod?

If there was a delegate to map a service name to a query, would that be sufficient?

Dona278 commented 4 months ago

"won't work at dev time" is what I said just after open the issue!

In our context we only have two namespaces in which the second one there are "worker service" (keda scaled object spawned based on azure queue), so for these services only we use the "namespace" behavior and only in "Production" environment. However I think that a delegate might be sufficent and maybe could be only part of the DnsSrv service discovery implementation, during services registration? And provide a sort of token replacement of a query instead of create a new one from zero? In order to allow only the add of the namespace leaving the kubernetes cluster hostname as it was read from cfg file.