ThreeMammals / Ocelot

.NET API Gateway
https://www.nuget.org/packages/Ocelot
MIT License
8.24k stars 1.62k forks source link

After Upgrade to 23.3.3 from 23.2.2 cluster name is being used in place of service address #2109

Closed tbd-develop closed 1 week ago

tbd-develop commented 3 weeks ago

Expected Behavior / New Feature

Resolving a service address to route an API request

Actual Behavior / Motivation for New Feature

Failure to resolve address as cluster name is being used in place of service host

Steps to Reproduce the Problem

  1. Create a service, configure in Consul to map an API request, ie.
        {
            "UpstreamHttpMethod": [
                "Post"
            ],
            "UpstreamPathTemplate": "/api/planning-items",
            "DownstreamPathTemplate": "/planning-items",
            "DownstreamScheme": "https",
            "ServiceName": "planning-items-api"
        },
  2. Make a request to the gateway
  3. Receive 502 Bad Gateway
    warn: Ocelot.Responder.Middleware.ResponderMiddleware[0]
      requestId: 0HN4GBAI6OJ7L:00000003, previousRequestId: No PreviousRequestId, message: 'Error Code: ConnectionToDownstreamServiceError
      Message: Error connecting to downstream service, exception: System.Net.Http.HttpRequestException: No such host is known. (server-1:7162)
       ---> System.Net.Sockets.SocketException (11001): No such host is known.
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
         at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)   
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)   
         at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
         at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at Ocelot.Requester.TimeoutDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at Ocelot.Requester.MessageInvokerHttpRequester.GetResponse(HttpContext httpContext) errors found in ResponderMiddleware. Setting error response for request path:/api/planning-items, request method: POST'

Same configuration is working fine on 23.2.2

Specifications

raman-m commented 3 weeks ago

Hello Terry! Welcome to Ocelot world! šŸ…

requestId: 0HN4GBAI6OJ7L:00000003, previousRequestId: No PreviousRequestId, message: 'Error Code: ConnectionToDownstreamServiceError
Message: Error connecting to downstream service, exception: System.Net.Http.HttpRequestException: No such host is known. (server-1:7162)
---> System.Net.Sockets.SocketException (11001): No such host is known.

Failure to resolve address as cluster name is being used in place of service host

Error messages tend to be self-explanatory. It appears that your host server-1:7162 is offline or not visible from Ocelot's machine. The default Consul provider uses node names as host names, so you must ensure that the server-1 machine is connectable from Ocelot's machine. This seems to be a typical DevOps/DNS issue in a Docker environment. If you are not utilizing Consul nodes, additional development may be required.

What DNS system are you employing if not Docker? Are you using Docker in conjunction with Compose?

raman-m commented 3 weeks ago

After Upgrade to 23.3.3 from 23.2.2 cluster name is being used in place of service address

Regarding the issue title, I'm unclear about the sentence related to the version upgrade. The Consul service discovery provider has definitely been updated, but the logic that constructs the service object at runtime remains unchanged: node names continue to be used as host names.

Could you please provide more details about the problem?

raman-m commented 3 weeks ago

@tbd-develop Yet another notification for you, Terry.

tbd-develop commented 3 weeks ago

Apologies for not replying sooner. I have Consul hosted in Docker, I am using compose to bring it up.

image

My planning-items service is a service running on my dev machine. With 23.2.2 version, a request to post to the planning-items endpoint mentioned in the configuration works fine, it resolves to https://localhost:7162. After upgrading to 23.3.3 it looks like it's trying to resolve server-1:7162.

raman-m commented 3 weeks ago

After upgrading to 23.3.3 it looks like it's trying to resolve server-1:7162.

...because server-1 is the node name, but it's not actual host name!

Terry, I've reviewed the changes between versions 23.2 and 23.3 of the Consul provider regarding the use of node names as service hosts, and there are no differences. The logic in version 23.2 and the new version 23.3 remains the same; we continue to use the service address if the node is null. Therefore, there's no need to define nodesā€”please remove them.

From what I understand, your node is named "server-1". Simply rename the server-1 node to localhost, and the problem should be resolved. We adhere to a naming convention where the node name matches the host name of the hosting machine, correct? Therefore, renaming "server-1" to "localhost" will not breach our current naming convention for nodes, as localhost is a valid host name that can also serve as the node name. Please read the docs: Consul Service Builder where the convention is explained.

If you prefer not to follow this convention and wish to keep the "server-1" node name, you will need to override services in DI. This requires writing C# code as outlined in the documentation for the AddConsul{T} method.

If the absence of well-explained documentation on naming conventions has caused confusion, we could update the existing documents to highlight the significance of node naming conventions. What do you think?

raman-m commented 1 week ago

@tbd-develop Terry, Now everything is clear? Did you manage to solve the problem?

tbd-develop commented 1 week ago

I still am not sure why what had configured before was incorrect, and why it was working until upgrading to 23.3.3. But, I've created an IConsulServiceBuilder implemention, used that and;

protected override string GetDownstreamHost(ServiceEntry entry, Node node) => entry.Service.Address;

This works as before now. Unfortunately, I don't have the time or opportunity to dig deeper and learn more about what I was doing wrong for now. So I guess this is a solution for me.

Be-made commented 1 week ago

I still am not sure why what had configured before was incorrect, and why it was working until upgrading to 23.3.3. But, I've created an IConsulServiceBuilder implemention, used that and;

protected override string GetDownstreamHost(ServiceEntry entry, Node node) => entry.Service.Address;

This works as before now. Unfortunately, I don't have the time or opportunity to dig deeper and learn more about what I was doing wrong for now. So I guess this is a solution for me.

Thx. Same problem.

raman-m commented 1 week ago

Issue has been resolved by the author.