dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps
https://learn.microsoft.com/dotnet/aspire
MIT License
3.94k stars 482 forks source link

[YARP + Aspire] Service discovery does not work after deployment in Azure #4605

Open 4eversoft opened 5 months ago

4eversoft commented 5 months ago

I use YARP in my microservice architecture and have experienced issues deploying to Azure.

I use YARP with these two extension methods and they work great for local development:

AddServiceDiscoveryDestinationResolver
AddHttpForwarderWithServiceDiscovery

After deployment to Azure, YARP could no longer find the other container apps (404), regardless of whether they were configured internally or externally in the Ingress settings.

My workaround now looks like this: I completely omit the methods mentioned for deployment and leave the service discovery to the Azure Container App Environment.

if (builder.Environment.IsDevelopment())
{
    // Service discovery is not needed in Azure Container App Environment

    builder.Services.AddReverseProxy()
        .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"))
        .AddAccessToken()
        .AddResilience()
        .AddServiceDiscoveryDestinationResolver();

    builder.Services.AddHttpForwarderWithServiceDiscovery();
}
else
{
    builder.Services.AddReverseProxy()
        .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"))
        .AddAccessToken()
        .AddResilience();
}

This now works, but I'm not sure if this is the intended solution.

Maybe someone also has an explanation why YARP does not work with the actually correct urls in the context of Azure Container Apps.

davidfowl commented 5 months ago

Can you share the entire application? Or a minimal repro that replicates what you are doing?

4eversoft commented 5 months ago

The architecture of the entire application is as follows: Folie1

YARP is part of the Blazor backend project and forwards the requests to the services. I have extended YARP via ForwarderHttpClientFactorywith a Polly handler .AddResilience() (see above) and also a handler for passing on the access token .AddAccessToken() otherwise, YARP only forwards the requests from the Blazor frontend:

    "Clusters": {
      "travellounge-catalog-api": {
        "Destinations": {
          "destination1": {
            "Address": "http://travellounge-catalog-api"
          }
        }
      },
      "travellounge-booking-api": {
        "Destinations": {
          "destination1": {
            "Address": "http://travellounge-booking-api"
          }
        }
      },
  ...
}

The definition in AppHost is not particularly spectacular either:

var travelLoungeCatalogApi = builder.AddProject<Projects.TravelLounge_Catalog_Api>("travellounge-catalog-api")
    .WithReference(travelLoungeSeq)
    .WithReference(travelLoungeCaching)
    .WithReference(travelLoungeKeyVault)
    .WithReference(travelLoungeInsights)
    .WithReference(travelLoungeMessaging)
    .WithReference(travelLoungeMsSqlCatalog)
    .WithReference(travelLoungeAuthorityApi);

var travelLoungeWebApp = builder.AddProject<Projects.TravelLounge_WebApp>("travellounge-webapp")
    .WithReference(travelLoungeSeq)
    .WithReference(travelLoungeKeyVault)
    .WithReference(travelLoungeInsights)
    .WithReference(travelLoungeAuthorityApi)
    .WithReference(travelLoungeBookingApi)
    .WithReference(travelLoungeCatalogApi)
    .WithReference(travelLoungeReviewApi)
    .WithExternalHttpEndpoints();

As I have already written, the forwarding of requests by YARP works if the target addresses are replaced locally by the Aspire Discovery Service and in Azure if the Discovery Service in the VNet of the Container Apps Environment replaces them.

The services cannot be found for some reason only if the Aspire Services also does this in Azure, although in my opinion these are actually correct.

davidfowl commented 5 months ago

I think you're going to need to create a minimal repro so that we can see what you're seeing. I don't see any issues with the tiny code snippets you've sent, but that's usually not where the problems are.

One more thing you can share is the manifest for this project.

Run this on the apphost:

dotnet run --publisher manifest --output-path aspire-manifest.json

and put the contents in the issue.

4eversoft commented 5 months ago

aspire-manifest.json here we go....

davidfowl commented 5 months ago

404 sounds like a YARP configuration problem, not a service discovery problem. Service discovery problems usually result in connection errors. This seems like the path for the target service is not correct. If you are unable to create a minimal repro, I'm not sure we can help you without someone working offline with your real project and helping you debug.

I'd suggest using the aspire dashboard in ACA to look at traces to help aid in diagnostics. If you observe it works that code you removed, then compare the traces with and without that code and look for clues.

4eversoft commented 5 months ago

I totally agree with you that this seems to be more of a problem on the YARP side. I'll try to come up with a minimal repo over the weekend, so hopefully we can figure out what's going wrong.

4eversoft commented 5 months ago

@davidfowl As you suggested, I have now created a minimal repo and was able to recreate the behavior I noticed.

AspireYarp repo

WolfspiritM commented 4 months ago

We manually deployed aspire and populated the service_* environment variables manually. This worked fine. However our app behind the yarp proxy needs access to the incoming host so we then set "RequestHeaderOriginalHost" which prevents yarp from rewriting the host header and forwards the host header to the backend app. This ends up in a 404 error from container apps saying that the app can't be found or is stopped as the ingress in front of the backend app doesn't know about the external domain.

If the request is forwarded internally directly to the service (instead of using the public ip where the service isn't addressed directly) I'd expect that to work and I wonder if that is a container app bug.

I can't see this to be the reason in the minimal repo but maybe there is some host rewrite happening here, too.

julioct commented 3 months ago

Just faced this exact same issue. The fix from @4eversoft works great and makes sense since ACA is already designed to discover services just by their name.

If it helps, I collected these logs of a failed request before disabling service discovery in Prod environment:

info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/1.1 GET http://gateway-service.mangoforest-aab1e0f4.eastus2.azurecontainerapps.io/catalog/genres - - -
info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[1]
      Request:
      Protocol: HTTP/1.1
      Method: GET
      Scheme: https
      PathBase: 
      Path: /catalog/genres
      Accept: */*
      Host: gateway-service.mangoforest-aab1e0f4.eastus2.azurecontainerapps.io
      User-Agent: PostmanRuntime/7.41.1
      Accept-Encoding: gzip, deflate, br
      Cache-Control: no-cache
      postman-token: [Redacted]
      X-Original-Proto: [Redacted]
      x-envoy-external-address: [Redacted]
      x-request-id: [Redacted]
      x-envoy-expected-rq-timeout-ms: [Redacted]
      x-k8se-app-name: [Redacted]
      x-k8se-app-namespace: [Redacted]
      x-k8se-protocol: [Redacted]
      x-k8se-app-kind: [Redacted]
      x-ms-containerapp-name: [Redacted]
      x-ms-containerapp-revision-name: [Redacted]
      x-arr-ssl: [Redacted]
      X-Original-For: [Redacted]
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'catalogGet'
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[9]
      Proxying to http://catalog-service.internal.mangoforest-aab1e0f4.eastus2.azurecontainerapps.io/genres HTTP/2 RequestVersionOrLower 
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[56]
      Received HTTP/1.1 response 404.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'catalogGet'
info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[2]
      Response:
      StatusCode: 404
      Content-Type: text/html; charset=utf-8
      Date: Wed, 21 Aug 2024 15:50:52 GMT
      Server: Kestrel
      Content-Length: 1946
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/1.1 GET https://gateway-service.mangoforest-aab1e0f4.eastus2.azurecontainerapps.io/catalog/genres - 404 1946 text/html;+charset=utf-8 2.5755ms

Added the workaround:

var reverseProxyBuilder = builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

if (!builder.Environment.IsProduction())
{
    reverseProxyBuilder.AddServiceDiscoveryDestinationResolver();
}

Now it just works:

info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[1]
      Request:
      Protocol: HTTP/1.1
      Method: GET
      Scheme: https
      PathBase: 
      Path: /catalog/genres
      Accept: */*
      Host: gateway-service.mangoforest-aab1e0f4.eastus2.azurecontainerapps.io
      User-Agent: PostmanRuntime/7.41.1
      Accept-Encoding: gzip, deflate, br
      Cache-Control: no-cache
      postman-token: [Redacted]
      X-Original-Proto: [Redacted]
      x-envoy-external-address: [Redacted]
      x-request-id: [Redacted]
      x-envoy-expected-rq-timeout-ms: [Redacted]
      x-k8se-app-name: [Redacted]
      x-k8se-app-namespace: [Redacted]
      x-k8se-protocol: [Redacted]
      x-k8se-app-kind: [Redacted]
      x-ms-containerapp-name: [Redacted]
      x-ms-containerapp-revision-name: [Redacted]
      x-arr-ssl: [Redacted]
      X-Original-For: [Redacted]
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[9]
      Proxying to http://catalog-service/genres HTTP/2 RequestVersionOrLower 
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[56]
      Received HTTP/1.1 response 200.
info: Microsoft.AspNetCore.HttpLogging.HttpLoggingMiddleware[2]
      Response:
      StatusCode: 200
      Content-Type: application/json; charset=utf-8
      Date: Wed, 21 Aug 2024 16:12:14 GMT
      Server: Kestrel
      Transfer-Encoding: chunked
ReubenBond commented 3 months ago

Could you try adding the "Host" directive to your YARP config, in addition to the Address? i.e, the config would become:

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "AllowedHosts": "*",
  "ReverseProxy": {
    "Routes": {
      "route1": {
        "ClusterId": "catalog-api",
        "Match": { "Path": "api/v1/products/{**catch-all}" }
      }
    },
    "Clusters": {
      "catalog-api": {
        "Destinations": {
          "destination1": {
            "Address": "http://apiservice",
            "Host": "apiservice"
          }
        }
      }
    }
  }
}
julioct commented 3 months ago

Could you try adding the "Host" directive to your YARP config, in addition to the Address? i.e, the config would become:

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "AllowedHosts": "*",
  "ReverseProxy": {
    "Routes": {
      "route1": {
        "ClusterId": "catalog-api",
        "Match": { "Path": "api/v1/products/{**catch-all}" }
      }
    },
    "Clusters": {
      "catalog-api": {
        "Destinations": {
          "destination1": {
            "Address": "http://apiservice",
            "Host": "apiservice"
          }
        }
      }
    }
  }
}

Tried it, same issue.

davidfowl commented 3 months ago

@ReubenBond can we look into this for 9?

ReubenBond commented 3 months ago

On it

gabynevada commented 2 months ago

After many many hours troubleshooting finally found this issue, the fix by @4eversoft is working great for now.

Here's some info from my debug journey if it helps:

After some tcpdump in the container it appears that it's forwarding the request of the non-transformed path 'bioportal/test' to an envoy proxy, which in turns does not find anything resembling that in the network and returns a 404.

Adding relevant tcp dumps sections:

E..\..@.>..Vdd.<dd.........|X.G.P...l...GET /bioportal/test HTTP/1.1
host: my-gateway.eastus2.azurecontainerapps.io
user-agent: curl/8.7.1
accept: */*
x-forwarded-for: 24.137.250.38
x-envoy-external-address: 24.137.250.38
x-request-id: 82ec8d46-3d23-4c05-887d-907f6d82caa4
x-envoy-expected-rq-timeout-ms: 1800000
x-k8se-app-name: my-gateway--kcii626
x-k8se-app-namespace: k8se-apps
x-k8se-protocol: http1
x-k8se-app-kind: web
x-ms-containerapp-name: my-gateway
x-ms-containerapp-revision-name: my-gateway--kcii626
x-arr-ssl: true
x-forwarded-proto: https

04:14:18.902984 eth0  Out IP prdh-gateway--kcii626-78f48686f8-ms6jz.8080 > 100-100-0-60.k8se-envoy-external-private.k8se-system.svc.cluster.local.49074: Flags [.], ack 565, win 501, length 0
E..(..@.@...dd..dd.<....X.G.....P.......
04:14:18.916307 eth0  Out IP prdh-gateway--kcii626-78f48686f8-ms6jz.8080 > 100-100-0-60.k8se-envoy-external-private.k8se-system.svc.cluster.local.49074: Flags [P.], seq 1:2089, ack 565, win 501, length 2088: HTTP: HTTP/1.1 404 Not Found
E..P..@.@.&]dd..dd.<....X.G.....P.......HTTP/1.1 404 Not Found
Content-Length: 1946
Content-Type: text/html; charset=utf-8
Date: Fri, 27 Sep 2024 04:14:18 GMT
Server: Kestrel
04:50:18.595207 lo    In  IP my-api.k8se-apps.svc.cluster.local.80 > 169.254.5.117.33328: Flags [S.], seq 1451490834, ack 4183102853, win 65495, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0
E..4..@.@..bdd.....u.P.0V....U......................
04:50:18.597139 lo    In  IP my-api.k8se-apps.svc.cluster.local.80 > 169.254.5.117.33328: Flags [.], ack 731, win 506, length 0
E..(..@.@.q.dd.....u.P.0V....U._P....}..
04:50:27.092596 lo    In  IP my-api.k8se-apps.svc.cluster.local.80 > 169.254.5.117.33328: Flags [P.], seq 1:7301, ack 731, win 512, length 7300: HTTP: HTTP/1.1 200 OK
E.....@.@.U.dd.....u.P.0V....U._P...)...HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
date: Fri, 27 Sep 2024 04:50:27 GMT
server: Kestrel
request-context: appId=
transfer-encoding: chunked
benjaminpetit commented 6 days ago

Update: I found the issue: the Host header set is wrong, it is currently set to the "short uri" (like in the repo apiservice instead of the FQDN of the discovered endpoint.

I am stil trying to understand why this short uri was set tin the host header in the first place.