dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps
https://learn.microsoft.com/dotnet/aspire
MIT License
3.95k stars 483 forks source link

[AzureTools][Aspire][Unstable] The state of the container in the Dashboard changes "Running (Not ready)" when stopping the running container and then starting the container in Podman Desktop #5952

Open v-mengwe opened 2 months ago

v-mengwe commented 2 months ago

Clean machine: Win11 x64 23h2 ENU VS Version: VS 17.12.0 Preview 3.0 [35325.140.main] Aspire Version: 9.0.0-preview.4.24475.6 Apply NuGet Feeds Podman Desktop: 1.12.0 Podman CLI: 5.2.3

REPRO STEPS:

  1. Create an Aspire starter project with "Use Redis for caching (requires Docker)" checked.
  2. Update package references to all of Aspire dependencies to version 9.0.0-preview.4.24475.6-> Add <Sdk Name="Aspire.AppHost.Sdk" Version="9.0.0-preview.4.24475.6" /> to the AppHost project file.
  3. F5/execute dotnet run in the AppHost project path and open the dashboard.
  4. Stop the running container in Podman Desktop -> Start container in Podman Desktop.
  5. Check whether the state of the container in the Dashboard changes "Running".

Expect: The state of the container in the Dashboard changes "Running".

Actual: [Unstable] The state of the container in the Dashboard changes "Running (Not ready)". image

More Info:

  1. This issue doesn't reproduce when stopping the running container and then starting the container in Dashboard.
  2. Frequency of recurrence, 7 out of 10 times. You can perform step 4 multiple times if you can't reproduce it.
balachir commented 1 month ago

@v-sherryfan can you try out this scenario too using latest VS 17.12 / CLI + VS Code? I'm interested to know if podman scenarios are blocked due to this issue.

cc: @v-elenafeng

v-sherryfan commented 1 month ago

@balachir I also reproduced this issue using SDK 9.0 GA + Aspire 9.0.0, but it currently behaves slightly differently than in this issue, as shown below. My verification results are unstable as mentioned in the issue. And it does not block our scenario, only when we manually stop and then start the container in podman desktop will we encounter this. Image

cc: @v-elenafeng

davidfowl commented 1 month ago

Does it stay unhealthy forever? The reason it says Running (unhealthy) is because even though the container is running it might be ready to receive traffic. The app host is doing health checks (for resources that have them configured) and that happens once the resource is running. It should eventually just say running.

v-sherryfan commented 1 month ago

@davidfowl Yes, the result of my verification is that if you encounter this problem, it will always be in an unhealthy state.

balachir commented 1 month ago

Some more info from @v-elenafeng: Repro with SDK 9.0 GA + Aspire 9.0.0. Only happens when we stop and start the container in podman desktop. And the unhealthy state persists.

@davidfowl I went ahead added this to 9.0 milestone for now. Can you help triage / assign?

davidfowl commented 1 month ago

I can't reproduce this one with podman on my devbox (I only have podman installed though)

davidfowl commented 1 month ago

cc @karolz-ms @dbreshears @mitchdenny

mitchdenny commented 1 month ago

I would expect stopping the resource via the Aspire dashboard to be more responsive. If you stop via podman desktop then you need to wait for DCP to detect that the container is no longer there and then propogate that state back from to Aspire.

In the intervening time it is possible for a health check to have attempted to run and push the state into Running (Not ready) ... because the health check would be failing. The behavior you see in the Aspire dashboard will depend on how close you were to the health check monitoring loop timing out and doing another health check when you stop the container.

joperezr commented 1 month ago

Given this doesn't reproduce every time, I don't think this sounds like something that would be blocking for 9.0. Please correct me if you think otherwise