dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps
https://learn.microsoft.com/dotnet/aspire
MIT License
3.83k stars 457 forks source link

Make dcp and dashboard communication reliable and robust #2422

Closed davidfowl closed 7 months ago

davidfowl commented 8 months ago

It seems like we have a set of issues that all have to do with launching and connecting to DCP and making sure that it's reliable. We should harder this code to make sure we a resilient to launching DCP (retrying if it fails to launch) and making sure it's healthy and if it goes unhealthy having a good way to recover.

davidfowl commented 8 months ago

cc @karolz-ms @danegsta

karolz-ms commented 8 months ago

All what I have seen so far in terms of inability to start DCP/dashboard fell into one of 3 buckets:

  1. Mismatched Aspire/DCP (workload) binaries. We should be on the look-out to minimize opportunities for such mismatch and I believe the workload set work that Jose and co are doing should be a big help. However, once you are in this situation, no amount of compensation/retrying will help. One thing that we can do now is to improve the error message like https://github.com/dotnet/aspire/issues/1976 suggested, e.g. by instructing developers how to ensure that their workload installation is healthy (DcpHostService would be the place I'd consider). Maybe a troubleshooting documentation page could also help here.

  2. DCP shutting down too slowly. @danegsta has fixed all known problems in this area and these fixes have already been integrated into Aspire (DCP release 0.1.54).

  3. "Arctic" run times out. By that I mean the very first run of a particular DCP version on a given machine. On my dev machine, which is no slouch, it routinly takes close to 10 seconds for the "advanced threat protection" AV software to scan new DCP binaries before they start running, every time I build. This makes me think that our retry timeout for establishing communication between app host and DCP is currently too small (5 seconds) and this PR https://github.com/dotnet/aspire/pull/2435/files#diff-a3e877b57b19124a6d9810e8072e6f1892acf96a26246fc4fcfa34570ed77028 is bumping it up to 20 seconds.

These are the things that IMO have/will make a difference--hope this helps and happy to hear more thoughts on the subject.

davidfowl commented 8 months ago

Seems like we also have issues around the dashboard connecting to the app host as well. I've seen 2 people with complaints like this https://github.com/dotnet/aspire/issues/2539 ( @drewnoakes @JamesNK ), there might be work to do on the dashboard side of things as well.

balachir commented 8 months ago

Our validation team also reported the issue in #2539 earlier this week but they said that they didn't open a new issue because they weren't able to get a stable repro. I'll ask them to get more logs the next time they see it.

davidfowl commented 8 months ago

@balachir I plan to work on some of these issues in preview 5 so a reliable repro or more data would be great. I'll try to add more logs so we can see what might be happening when we get into this state.

davidfowl commented 8 months ago

@danegsta will take an initial look here.

davidfowl commented 8 months ago

Just saw this one:

fail: Aspire.Hosting.Dashboard.DcpDataSource[0]
      Watch task over kubernetes Container resources terminated
      System.Net.Http.HttpRequestException: Error while copying content to a stream.
       ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at k8s.LineSeparatedHttpContent.PeekableStreamReader.PeekLineAsync()
         at k8s.LineSeparatedHttpContent.SerializeToStreamAsync(Stream stream, TransportContext context)
         at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
         at System.Net.Http.HttpContent.WaitAndReturnAsync[TState,TResult](Task waitTask, TState state, Func`2 returnFunc)
         at k8s.Kubernetes.CreateResultAsync[T](HttpRequestMessage httpRequest, HttpResponseMessage httpResponse, Nullable`1 watch, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.ICustomObjectsOperations_ListClusterCustomObjectWithHttpMessagesAsync[T](String group, String version, String plural, Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, String resourceVersion, String resourceVersionMatch, Nullable`1 timeoutSeconds, Nullable`1 watch, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.ListClusterCustomObjectWithHttpMessagesAsync(String group, String version, String plural, Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, String resourceVersion, String resourceVersionMatch, Nullable`1 timeoutSeconds, Nullable`1 watch, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.WatcherExt.<>c__DisplayClass1_0`2.<<MakeStreamReaderCreator>b__0>d.MoveNext()
      --- End of stack trace from previous location ---
         at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
         at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
         at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
      --- End of stack trace from previous location ---
         at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
      --- End of stack trace from previous location ---
         at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
         at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+MoveNext() in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 169
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+MoveNext() in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 169
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
         at Aspire.Hosting.Dashboard.DcpDataSource.<>c__DisplayClass10_1.<<-ctor>g__WatchKubernetesResource|1>d`1.MoveNext() in C:\dev\git\aspire\src\Aspire.Hosting\Dashboard\DcpDataSource.cs:line 88
      --- End of stack trace from previous location ---
         at Aspire.Hosting.Dashboard.DcpDataSource.<>c__DisplayClass10_1.<<-ctor>g__WatchKubernetesResource|1>d`1.MoveNext() in C:\dev\git\aspire\src\Aspire.Hosting\Dashboard\DcpDataSource.cs:line 88

PC went to sleep and the background thread watching resources died.

karolz-ms commented 8 months ago

We should probably have a way to restart the watchers. It is part of the K8s contract that the API server might end them occasionally. We just need to be careful not to try to restart them in a tight loop. In other words, retry with exponential backoff, and ideally have means to indicate in the dashboard UI that the data is stale.

Riff451 commented 8 months ago

Hi, I think I'm facing one of the issues reported here. It's started suddenly.

Aspire.Hosting 8.0.0-preview.3.24105.21

Dashboard exception:

blazor.web.js:1 [2024-03-06T22:09:16.464Z] Error: Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it.")
 ---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context)
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken)
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageInvoker.<SendAsync>g__SendAsyncWithTelemetry|6_0(HttpMessageHandler handler, HttpRequestMessage request, CancellationToken cancellationToken)
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)
   at Grpc.Net.Client.Internal.Retry.RetryCallBase`2.GetResponseCoreAsync()
   at Aspire.Dashboard.Model.DashboardClient.<>c__DisplayClass23_0.<<EnsureInitialized>g__ConnectAsync|2>d.MoveNext() in /_/src/Aspire.Dashboard/Model/DashboardClient.cs:line 158
--- End of stack trace from previous location ---
   at Aspire.Dashboard.Components.ApplicationName.OnInitializedAsync() in /_/src/Aspire.Dashboard/Components/Controls/ApplicationName.razor.cs:line 32
   at Microsoft.AspNetCore.Components.ComponentBase.RunInitAndSetParametersAsync()
   at Microsoft.AspNetCore.Components.RenderTree.Renderer.GetErrorHandledTask(Task taskToHandle, ComponentState owningComponentState)

AppHost logs:

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
trce: Grpc.AspNetCore.Server.Model.Internal.ServiceRouteBuilder[2]
      Discovering gRPC methods for Aspire.Hosting.Dashboard.DashboardService.
trce: Grpc.AspNetCore.Server.Model.Internal.ServiceRouteBuilder[1]
      Added gRPC method 'GetApplicationInformation' to service 'aspire.v1.DashboardService'. Method type: Unary, HTTP method: POST, route pattern: '/aspire.v1.DashboardService/GetApplicationInformation'.
trce: Grpc.AspNetCore.Server.Model.Internal.ServiceRouteBuilder[1]
      Added gRPC method 'WatchResources' to service 'aspire.v1.DashboardService'. Method type: ServerStreaming, HTTP method: POST, route pattern: '/aspire.v1.DashboardService/WatchResources'.
trce: Grpc.AspNetCore.Server.Model.Internal.ServiceRouteBuilder[1]
      Added gRPC method 'WatchResourceConsoleLogs' to service 'aspire.v1.DashboardService'. Method type: ServerStreaming, HTTP method: POST, route pattern: '/aspire.v1.DashboardService/WatchResourceConsoleLogs'.
trce: Grpc.AspNetCore.Server.Model.Internal.ServiceRouteBuilder[1]
      Added gRPC method 'ExecuteResourceCommand' to service 'aspire.v1.DashboardService'. Method type: Unary, HTTP method: POST, route pattern: '/aspire.v1.DashboardService/ExecuteResourceCommand'.
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application starting.
info: Aspire.Hosting.DistributedApplication[0]
      Application host directory is: D:\projects\trailsmory\trailsmory\src\SiRiff.Trailsmory.Aspire.AppHost
dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
dbug: Microsoft.AspNetCore.Hosting.Diagnostics[13]
      Loaded hosting startup assembly SiRiff.Trailsmory.Aspire.AppHost
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
info: Aspire.Hosting.Dcp.DcpHostService[0]
      Starting DCP with arguments: start-apiserver --monitor 70836 --detach --kubeconfig "C:\Users\User\AppData\Local\Temp\aspire.zy2zduqx.mpc\kubeconfig"
info: Aspire.Hosting.Dcp.start-apiserver.dcp-host[0]
      Starting DCP API server
info: Aspire.Hosting.Dcp.start-apiserver.dcp-host[0]
      Starting DCP controller host
info: Aspire.Hosting.Dcp.start-apiserver.dcp-host[0]
      Started all services      {"count": 2}
info: Aspire.Hosting.Dcp.api-server[0]
      Starting API server...
info: Aspire.Hosting.Dcp.api-server[0]
      API server started        {"Address": "127.0.0.1", "Port": 62409}
info: Aspire.Hosting.Dcp.dcpctrl[0]
      starting controller manager
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      starting process...       {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"aspire-dashboard"}, "Reconciliation": 2, "executable": "C:\\Program Files\\dotnet\\packs\\Aspire.Dashboard.Sdk.win-x64\\8.0.0-preview.3.24105.21\\tools\\Aspire.Dashboard.exe"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      process started   {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"aspire-dashboard"}, "Reconciliation": 2, "executable": "C:\\Program Files\\dotnet\\packs\\Aspire.Dashboard.Sdk.win-x64\\8.0.0-preview.3.24105.21\\tools\\Aspire.Dashboard.exe", "PID": 71072}
info: Aspire.Hosting.DistributedApplication[0]
      Now listening on: http://localhost:15051
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service proxy started     {"ServiceName": {"name":"trailsmory-postgres"}, "Reconciliation": 3, "EffectiveAddress": "localhost", "EffectivePort": 62414}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres is now in state NotReady     {"ServiceName": {"name":"trailsmory-postgres"}, "Reconciliation": 3}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres is now running on localhost:62414    {"ServiceName": {"name":"trailsmory-postgres"}, "Reconciliation": 3}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service proxy started     {"ServiceName": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5, "EffectiveAddress": "localhost", "EffectivePort": 62415}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres-pgadmin is now in state NotReady     {"ServiceName": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres-pgadmin is now running on localhost:62415    {"ServiceName": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service proxy started     {"ServiceName": {"name":"trailsmory-api"}, "Reconciliation": 7, "EffectiveAddress": "localhost", "EffectivePort": 5162}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-api is now in state NotReady  {"ServiceName": {"name":"trailsmory-api"}, "Reconciliation": 7}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-api is now running on localhost:5162  {"ServiceName": {"name":"trailsmory-api"}, "Reconciliation": 7}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service proxy started     {"ServiceName": {"name":"trailsmory-webapp"}, "Reconciliation": 11, "EffectiveAddress": "localhost", "EffectivePort": 5080}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-webapp is now in state NotReady       {"ServiceName": {"name":"trailsmory-webapp"}, "Reconciliation": 11}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-webapp is now running on localhost:5080       {"ServiceName": {"name":"trailsmory-webapp"}, "Reconciliation": 11}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      scheduling container start        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres"}, "Reconciliation": 2, "image": "postgres:latest"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      starting container        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres"}, "Reconciliation": 2, "image": "postgres:latest"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      scheduling container start        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5, "image": "dpage/pgadmin4:latest"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      starting container        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5, "image": "dpage/pgadmin4:latest"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReplicaSetReconciler[0]
      created Executable        {"Controller": "usvc-dev.developer.microsoft.com/executable-replica-set-reconciler", "ExecutableReplicaSet": {"name":"trailsmory-api"}, "Reconciliation": 2, "DesiredReplicas": 1, "TotalReplicas": 0, "ActiveReplicas": 0, "exe": {"metadata":{"name":"trailsmory-api-mq1hf4g","uid":"2891be0e-6c16-4c5f-8fb6-61609bbd7917","resourceVersion":"24","creationTimestamp":"2024-03-06T22:09:11Z","annotations":{"csharp-project-path":"D:\\projects\\trailsmory\\trailsmory\\src\\backend\\SiRiff.Trailsmory.Api\\SiRiff.Trailsmory.Api.csproj","executable-replica-set.usvc-dev.developer.microsoft.com/display-name":"trailsmory-api-1","executable-replica-set.usvc-dev.developer.microsoft.com/replica-state":"active","otel-service-name":"trailsmory-api","service-producer":"[{\"serviceName\":\"trailsmory-api\",\"address\":null,\"port\":null}]"},"ownerReferences":[{"apiVersion":"usvc-dev.developer.microsoft.com/v1","kind":"ExecutableReplicaSet","name":"trailsmory-api","uid":"603a4c46-5842-4eae-86b6-be5a77a9b623","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"dcpctrl.exe","operation":"Update","apiVersion":"usvc-dev.developer.microsoft.com/v1","time":"2024-03-06T22:09:11Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csharp-project-path":{},"f:executable-replica-set.usvc-dev.developer.microsoft.com/display-name":{},"f:executable-replica-set.usvc-dev.developer.microsoft.com/replica-state":{},"f:otel-service-name":{},"f:service-producer":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"603a4c46-5842-4eae-86b6-be5a77a9b623\"}":{}}},"f:spec":{"f:env":{},"f:executablePath":{},"f:executionType":{},"f:workingDirectory":{}}}}]},"spec":{"executablePath":"dotnet","workingDirectory":"D:\\projects\\trailsmory\\trailsmory\\src\\backend\\SiRiff.Trailsmory.Api","env":[{"name":"DOTNET_LAUNCH_PROFILE","value":"http"},{"name":"ASPNETCORE_URLS","value":"http://localhost:{{- portForServing \"trailsmory-api\" -}}"},{"name":"ASPNETCORE_ENVIRONMENT","value":"Development"},{"name":"OTEL_DOTNET_EXPERIMENTAL_OTLP_EMIT_EXCEPTION_LOG_ATTRIBUTES","value":"true"},{"name":"OTEL_DOTNET_EXPERIMENTAL_OTLP_EMIT_EVENT_LOG_ATTRIBUTES","value":"true"},{"name":"OTEL_EXPORTER_OTLP_ENDPOINT","value":"http://localhost:16150"},{"name":"OTEL_RESOURCE_ATTRIBUTES","value":"service.instance.id={{- .UID -}}"},{"name":"OTEL_SERVICE_NAME","value":"{{- index .Annotations \"otel-service-name\" -}}"},{"name":"OTEL_BLRP_SCHEDULE_DELAY","value":"1000"},{"name":"OTEL_BSP_SCHEDULE_DELAY","value":"1000"},{"name":"OTEL_METRIC_EXPORT_INTERVAL","value":"1000"},{"name":"DOTNET_SYSTEM_CONSOLE_ALLOW_ANSI_COLOR_REDIRECTION","value":"true"},{"name":"LOGGING__CONSOLE__FORMATTERNAME","value":"simple"},{"name":"LOGGING__CONSOLE__FORMATTEROPTIONS__TIMESTAMPFORMAT","value":"yyyy-MM-ddTHH:mm:ss.fffffff "},{"name":"ConnectionStrings__trailsmory-db","value":"Host=localhost;Port=62414;Username=postgres;Password=a38f0ed433fb49359b54a23c91f43355;Database=trailsmory-db"}],"executionType":"IDE"},"status":{"executionID":"","state":"","startupTimestamp":null,"finishTimestamp":null}}}
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application started. Press CTRL-C to stop.
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReplicaSetReconciler[0]
      created Executable        {"Controller": "usvc-dev.developer.microsoft.com/executable-replica-set-reconciler", "ExecutableReplicaSet": {"name":"trailsmory-webapp"}, "Reconciliation": 6, "DesiredReplicas": 1, "TotalReplicas": 0, "ActiveReplicas": 0, "exe": {"metadata":{"name":"trailsmory-webapp-5cmse6g","uid":"afd46258-0606-43c8-aa00-3435c9140ae1","resourceVersion":"29","creationTimestamp":"2024-03-06T22:09:12Z","annotations":{"csharp-project-path":"D:\\projects\\trailsmory\\trailsmory\\src\\app\\web\\SiRiff.Trailsmory.WebApp\\SiRiff.Trailsmory.WebApp.csproj","executable-replica-set.usvc-dev.developer.microsoft.com/display-name":"trailsmory-webapp-1","executable-replica-set.usvc-dev.developer.microsoft.com/replica-state":"active","otel-service-name":"trailsmory-webapp","service-producer":"[{\"serviceName\":\"trailsmory-webapp\",\"address\":null,\"port\":null}]"},"ownerReferences":[{"apiVersion":"usvc-dev.developer.microsoft.com/v1","kind":"ExecutableReplicaSet","name":"trailsmory-webapp","uid":"c75e24a1-2e5a-49fd-9954-771afc19b9b9","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"dcpctrl.exe","operation":"Update","apiVersion":"usvc-dev.developer.microsoft.com/v1","time":"2024-03-06T22:09:12Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csharp-project-path":{},"f:executable-replica-set.usvc-dev.developer.microsoft.com/display-name":{},"f:executable-replica-set.usvc-dev.developer.microsoft.com/replica-state":{},"f:otel-service-name":{},"f:service-producer":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"c75e24a1-2e5a-49fd-9954-771afc19b9b9\"}":{}}},"f:spec":{"f:env":{},"f:executablePath":{},"f:executionType":{},"f:workingDirectory":{}}}}]},"spec":{"executablePath":"dotnet","workingDirectory":"D:\\projects\\trailsmory\\trailsmory\\src\\app\\web\\SiRiff.Trailsmory.WebApp","env":[{"name":"DOTNET_LAUNCH_PROFILE","value":"http"},{"name":"ASPNETCORE_URLS","value":"http://localhost:{{- portForServing \"trailsmory-webapp\" -}}"},{"name":"ASPNETCORE_ENVIRONMENT","value":"Development"},{"name":"OTEL_DOTNET_EXPERIMENTAL_OTLP_EMIT_EXCEPTION_LOG_ATTRIBUTES","value":"true"},{"name":"OTEL_DOTNET_EXPERIMENTAL_OTLP_EMIT_EVENT_LOG_ATTRIBUTES","value":"true"},{"name":"OTEL_EXPORTER_OTLP_ENDPOINT","value":"http://localhost:16150"},{"name":"OTEL_RESOURCE_ATTRIBUTES","value":"service.instance.id={{- .UID -}}"},{"name":"OTEL_SERVICE_NAME","value":"{{- index .Annotations \"otel-service-name\" -}}"},{"name":"OTEL_BLRP_SCHEDULE_DELAY","value":"1000"},{"name":"OTEL_BSP_SCHEDULE_DELAY","value":"1000"},{"name":"OTEL_METRIC_EXPORT_INTERVAL","value":"1000"},{"name":"DOTNET_SYSTEM_CONSOLE_ALLOW_ANSI_COLOR_REDIRECTION","value":"true"},{"name":"LOGGING__CONSOLE__FORMATTERNAME","value":"simple"},{"name":"LOGGING__CONSOLE__FORMATTEROPTIONS__TIMESTAMPFORMAT","value":"yyyy-MM-ddTHH:mm:ss.fffffff "},{"name":"services__trailsmory-api__0","value":"http://localhost:5162"},{"name":"services__trailsmory-api__1","value":"http://localhost:5162"}],"executionType":"IDE"},"status":{"executionID":"","state":"","startupTimestamp":null,"finishTimestamp":null}}}
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      IDE run session started   {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-webapp-5cmse6g"}, "Reconciliation": 7, "RunID": "15"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      IDE run session started   {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-api-mq1hf4g"}, "Reconciliation": 6, "RunID": "16"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-webapp-5cmse6g"}, "Reconciliation": 10, "PropertiesChanged": "{exeState=Starting->Running, executionID=(empty)->15, startupTimestamp=(zero)->Mar  6 22:09:12.409, stdOutFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-webapp-5cmse6g_out_afd46258-0606-43c8-aa00-3435c9140ae1, stdErrFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-webapp-5cmse6g_err_afd46258-0606-43c8-aa00-3435c9140ae1, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-webapp is now in state Ready  {"ServiceName": {"name":"trailsmory-webapp"}, "Reconciliation": 13}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-webapp-5cmse6g"}, "Reconciliation": 11, "PropertiesChanged": "{exeState=Starting->Running, executionID=(empty)->15, startupTimestamp=(zero)->Mar  6 22:09:12.409, stdOutFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-webapp-5cmse6g_out_afd46258-0606-43c8-aa00-3435c9140ae1, stdErrFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-webapp-5cmse6g_err_afd46258-0606-43c8-aa00-3435c9140ae1, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container created {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres"}, "Reconciliation": 2, "ContainerID": "21a404e4f6fe632fa6fca7c1f535a56eaeb41b27af90988ef1e78fc280e6c576"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container created {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5, "ContainerID": "f81ac0d811ee6f4357e5417ac39a2a86fbb99ac03123f058a7f9739591228a58"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"aspire-dashboard"}, "Reconciliation": 13, "PropertiesChanged": "{exeState=Running->Finished, exitCode=(null)->-532462766, startupTimestamp=Mar  6 22:09:11.000->Mar  6 22:09:11.612, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-api-mq1hf4g"}, "Reconciliation": 16, "PropertiesChanged": "{exeState=Starting->Running, executionID=(empty)->16, startupTimestamp=(zero)->Mar  6 22:09:12.523, stdOutFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-api-mq1hf4g_out_2891be0e-6c16-4c5f-8fb6-61609bbd7917, stdErrFile=(empty)->C:\\Users\\User\\AppData\\Local\\Temp\\aspire.zy2zduqx.mpc\\trailsmory-api-mq1hf4g_err_2891be0e-6c16-4c5f-8fb6-61609bbd7917, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-api is now in state Ready     {"ServiceName": {"name":"trailsmory-api"}, "Reconciliation": 15}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-api-mq1hf4g"}, "Reconciliation": 17, "PropertiesChanged": "{pid=(null)->61664, startupTimestamp=Mar  6 22:09:12.000->Mar  6 22:09:12.523, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-api-mq1hf4g"}, "Reconciliation": 18, "PropertiesChanged": "{pid=(null)->61664, startupTimestamp=Mar  6 22:09:12.000->Mar  6 22:09:12.523, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ExecutableReconciler[0]
      Executable run changed    {"Controller": "usvc-dev.developer.microsoft.com/executable-reconciler", "Executable": {"name":"trailsmory-webapp-5cmse6g"}, "Reconciliation": 22, "PropertiesChanged": "{pid=(null)->70156, startupTimestamp=Mar  6 22:09:12.000->Mar  6 22:09:12.409, }"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container started {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 5, "ContainerID": "f81ac0d811ee6f4357e5417ac39a2a86fbb99ac03123f058a7f9739591228a58"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container started {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres"}, "Reconciliation": 2, "ContainerID": "21a404e4f6fe632fa6fca7c1f535a56eaeb41b27af90988ef1e78fc280e6c576"}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container has started successfully        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 7, "ContainerID": "f81ac0d811ee6f4357e5417ac39a2a86fbb99ac03123f058a7f9739591228a58"}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres-pgadmin is now in state Ready        {"ServiceName": {"name":"trailsmory-postgres-pgadmin"}, "Reconciliation": 17}
info: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      container has started successfully        {"Controller": "usvc-dev.developer.microsoft.com/container-reconciler", "Container": {"name":"trailsmory-postgres"}, "Reconciliation": 8, "ContainerID": "21a404e4f6fe632fa6fca7c1f535a56eaeb41b27af90988ef1e78fc280e6c576"}
info: Aspire.Hosting.Dcp.dcpctrl.ServiceReconciler[0]
      service /trailsmory-postgres is now in state Ready        {"ServiceName": {"name":"trailsmory-postgres"}, "Reconciliation": 19}

AppHost env variables (excluded what I thought it wasn't relevant):

ASPNETCORE_ENVIRONMENT  Development
ASPNETCORE_URLS http://localhost:15051
DEBUG_SESSION_PORT  localhost:59981
DEBUG_SESSION_TOKEN 796326216b0e4a1888f7509b432b63b0
DOTNET_DASHBOARD_OTLP_ENDPOINT_URL  http://localhost:16150
DOTNET_DASHBOARD_URL    http://localhost:15051
DOTNET_ENVIRONMENT  Development
DOTNET_LAUNCH_PROFILE   http

Dashboard env variables (excluded what I thought it wasn't relevant):

ASPNETCORE_ENVIRONMENT  Development
ASPNETCORE_URLS http://localhost:15051
DEBUG_SESSION_PORT  localhost:56330
DEBUG_SESSION_TOKEN eb1a79ec26034eeea78001a487db9ca7
DOTNET_DASHBOARD_OTLP_ENDPOINT_URL  http://localhost:16150
DOTNET_DASHBOARD_URL    http://localhost:15051
DOTNET_ENVIRONMENT  Development
DOTNET_LAUNCH_PROFILE   http
DOTNET_RESOURCE_SERVICE_ENDPOINT_URL    http://127.0.0.1:57881

I hope it can help a bit :) Thanks for Aspire!

joaojvf commented 8 months ago

Hi Guys! Same issue of the Riff451, Aspire looks a game change technology!

ElanHasson commented 7 months ago

@karolz-ms I just got bit by mismatched binaries and it took a while to figure it out. I didn't realize to check until I saw your comment above. I thought dependabot already upgraded the packages for me :)

Do you think it makes sense to detect the mismatch and throw an exception stating as much?

davidfowl commented 7 months ago

These will be much less common once we ship GA. Well gracefully degrade functionality or throw if it’s not available.

joperezr commented 7 months ago

cc @adityamandaleeka

mitchdenny commented 7 months ago

Just had an interesting one. Walked away from my machine for an hour (desktop, always on), and saw this exception:

info: Aspire.Hosting.DistributedApplication[0]
      Aspire version: 8.0.0-dev
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application starting.
info: Aspire.Hosting.DistributedApplication[0]
      Application host directory is: C:\Code\aspire\playground\OpenAIEndToEnd\OpenAIEndToEnd.AppHost
info: Aspire.Hosting.Azure.AzureProvisioner[0]
      Azure resource connection strings saved to user secrets.
info: Aspire.Hosting.DistributedApplication[0]
      Now listening on: http://localhost:15216
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application started. Press Ctrl+C to shut down.
fail: Aspire.Hosting.Dcp.ApplicationExecutor[0]
      Watch task over kubernetes Container resources terminated
      System.Net.Http.HttpRequestException: Error while copying content to a stream.
       ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at k8s.LineSeparatedHttpContent.PeekableStreamReader.PeekLineAsync()
         at k8s.LineSeparatedHttpContent.SerializeToStreamAsync(Stream stream, TransportContext context)
         at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
         at System.Net.Http.HttpContent.WaitAndReturnAsync[TState,TResult](Task waitTask, TState state, Func`2 returnFunc)
         at k8s.Kubernetes.CreateResultAsync[T](HttpRequestMessage httpRequest, HttpResponseMessage httpResponse, Nullable`1 watch, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.ICustomObjectsOperations_ListClusterCustomObjectWithHttpMessagesAsync[T](String group, String version, String plural, Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, String resourceVersion, String resourceVersionMatch, Nullable`1 timeoutSeconds, Nullable`1 watch, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.ListClusterCustomObjectWithHttpMessagesAsync(String group, String version, String plural, Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, String resourceVersion, String resourceVersionMatch, Nullable`1 timeoutSeconds, Nullable`1 watch, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.WatcherExt.<>c__DisplayClass1_0`2.<<MakeStreamReaderCreator>b__0>d.MoveNext()
      --- End of stack trace from previous location ---
         at k8s.Watcher`1.<>c.<CreateWatchEventEnumerator>b__21_1[TR](Task`1 t)
         at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
         at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
      --- End of stack trace from previous location ---
         at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
      --- End of stack trace from previous location ---
         at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+MoveNext()
         at k8s.Watcher`1.CreateWatchEventEnumerator(Func`1 streamReaderCreator, Action`1 onError, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+MoveNext() in C:\Code\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 170
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+MoveNext() in C:\Code\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 170
         at Aspire.Hosting.Dcp.KubernetesService.WatchAsync[T](String namespaceParameter, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
         at Aspire.Hosting.Dcp.ApplicationExecutor.<>c__DisplayClass25_0.<<WatchResourceChanges>g__WatchKubernetesResource|1>d`1.MoveNext() in C:\Code\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 161
      --- End of stack trace from previous location ---
         at Aspire.Hosting.Dcp.ApplicationExecutor.<>c__DisplayClass25_0.<<WatchResourceChanges>g__WatchKubernetesResource|1>d`1.MoveNext() in C:\Code\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 161
mitchdenny commented 7 months ago

Ah ... so this was the same as Davids.

mitchdenny commented 7 months ago

Hrm I am wondering whether this is a long poll timing out?

karolz-ms commented 7 months ago

@mitchdenny see https://github.com/dotnet/aspire/issues/2422#issuecomment-1974078501 My recommendation would be not to rely on very long timeouts, but instead retry as described in the comment above.

davidfowl commented 7 months ago

Just got this:

Hosting failed to start
      System.IO.IOException: The process cannot access the file 'C:\Users\davifowl\AppData\Local\Temp\aspire.0h3hcao5.chk\kubeconfig' because it is being used by another process.
         at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
         at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
         at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
         at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
         at System.IO.FileInfo.OpenRead()
         at k8s.KubernetesClientConfiguration.LoadKubeConfigAsync(FileInfo kubeconfig, Boolean useRelativePaths)
         at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFileAsync(FileInfo kubeconfig, String currentContext, String masterUrl, Boolean useRelativePaths)
         at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFile(FileInfo kubeconfig, String currentContext, String masterUrl, Boolean useRelativePaths)
         at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFile(String kubeconfigPath, String currentContext, String masterUrl, Boolean useRelativePaths)
         at Aspire.Hosting.Dcp.KubernetesService.EnsureKubernetes() in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 291
         at Aspire.Hosting.Dcp.KubernetesService.ExecuteWithRetry[TResult](DcpApiOperationType operationType, String resourceType, Func`2 operation, CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 257
         at Aspire.Hosting.Dcp.ApplicationExecutor.CreateResourcesAsync[RT](CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 1680
         at Aspire.Hosting.Dcp.ApplicationExecutor.CreateServicesAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 858
         at Aspire.Hosting.Dcp.ApplicationExecutor.RunApplicationAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 128
         at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\DcpHostService.cs:line 70
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
Unhandled exception. System.AggregateException: One or more errors occurred. (The process cannot access the file 'C:\Users\davifowl\AppData\Local\Temp\aspire.0h3hcao5.chk\kubeconfig' because it is being used by another process.)
 ---> System.IO.IOException: The process cannot access the file 'C:\Users\davifowl\AppData\Local\Temp\aspire.0h3hcao5.chk\kubeconfig' because it is being used by another process.
   at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.FileInfo.OpenRead()
   at k8s.KubernetesClientConfiguration.LoadKubeConfigAsync(FileInfo kubeconfig, Boolean useRelativePaths)
   at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFileAsync(FileInfo kubeconfig, String currentContext, String masterUrl, Boolean useRelativePaths)
   at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFile(FileInfo kubeconfig, String currentContext, String masterUrl, Boolean useRelativePaths)
   at k8s.KubernetesClientConfiguration.BuildConfigFromConfigFile(String kubeconfigPath, String currentContext, String masterUrl, Boolean useRelativePaths)
   at Aspire.Hosting.Dcp.KubernetesService.EnsureKubernetes() in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 291
   at Aspire.Hosting.Dcp.KubernetesService.ExecuteWithRetry[TResult](DcpApiOperationType operationType, String resourceType, Func`2 operation, CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\KubernetesService.cs:line 257
   at Aspire.Hosting.Dcp.ApplicationExecutor.CreateResourcesAsync[RT](CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 1680
   at Aspire.Hosting.Dcp.ApplicationExecutor.CreateServicesAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 858
   at Aspire.Hosting.Dcp.ApplicationExecutor.RunApplicationAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\ApplicationExecutor.cs:line 128
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\Dcp\DcpHostService.cs:line 70
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Aspire.Hosting.DistributedApplication.RunAsync(CancellationToken cancellationToken) in C:\dev\git\aspire\src\Aspire.Hosting\DistributedApplication.cs:line 102
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at Aspire.Hosting.DistributedApplication.Run() in C:\dev\git\aspire\src\Aspire.Hosting\DistributedApplication.cs:line 110
   at Program.<Main>$(String[] args) in C:\dev\git\aspire\playground\TestShop\AppHost\Program.cs:line 42
mitchdenny commented 7 months ago

Does it reproduce? Reliably? I'm wondering whether we have a race to read the file before DCP has finished writing it.

davidfowl commented 7 months ago

No it's not reliable.

mitchdenny commented 7 months ago

This reliability issue should be partially addressed in this PR #3132

karolz-ms commented 7 months ago

We are making 3 changes here to improve things in Aspire P5

  1. The watch retries courtesy of @mitchdenny
  2. The fact that DCP API server is now listening on the port BEFORE the kubeconfig file is written
  3. The kubeconfig will not appear until it is fully written
adityamandaleeka commented 7 months ago

Was just chatting with @mitchdenny about this. I'm hitting this constantly:

fail: Microsoft.Extensions.Hosting.Internal.Host[11]
      Hosting failed to start
      Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check had an unexpected error System.InvalidOperationException: Command C:\Program Files\dotnet\packs\Aspire.Hosting.Orchestration.win-x64\8.0.0-preview.4.24079.3\tools\dcp.exe info returned non-zero exit code 1
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 78.
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 105
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 115
         at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
Unhandled exception. System.AggregateException: One or more errors occurred. (Application orchestrator dependency check had an unexpected error System.InvalidOperationException: Command C:\Program Files\dotnet\packs\Aspire.Hosting.Orchestration.win-x64\8.0.0-preview.4.24079.3\tools\dcp.exe info returned non-zero exit code 1
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 78.)
 ---> Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check had an unexpected error System.InvalidOperationException: Command C:\Program Files\dotnet\packs\Aspire.Hosting.Orchestration.win-x64\8.0.0-preview.4.24079.3\tools\dcp.exe info returned non-zero exit code 1
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 78.
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 105
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 115
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Aspire.Hosting.DistributedApplication.RunAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/DistributedApplication.cs:line 102
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at Aspire.Hosting.DistributedApplication.Run() in /_/src/Aspire.Hosting/DistributedApplication.cs:line 110
   at Program.<Main>$(String[] args) in C:\code\neweshop\eShop\src\eShop.AppHost\Program.cs:line 82
karolz-ms commented 7 months ago

@adityamandaleeka what does it say when you run dcp info from the command line?

karolz-ms commented 7 months ago

Looks like you are still using Aspire P4 version @adityamandaleeka

davidfowl commented 7 months ago

We should print the output from the health check on failure

adityamandaleeka commented 7 months ago

Yes, updating fixed the error I was seeing, thanks @karolz-ms

mitchdenny commented 7 months ago

Made progress on this in P5, will likely still be more issues found so keeping alive for P6.

joperezr commented 7 months ago

Given we are not aware of any work that we need to do here, We opted for closing this and we can open a new issue if new work is discovered/planned.