dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps
https://learn.microsoft.com/dotnet/aspire
MIT License
3.8k stars 450 forks source link

Inconsistent state of mount path across Aspire runs #5255

Open galvesribeiro opened 2 months ago

galvesribeiro commented 2 months ago

Is there an existing issue for this?

Describe the bug

I've noticed that if on your AppHost you create some local directory then map it on one or more container resources like for example databases in order to persist the data, there is something hanging on either Aspire or Docker. If we stop the debug session, delete the directory, and start again, even tho the directories are created, somehow the containers aren't able to see it again and fail to start. A docker restart is required after deleting such directories before it works again.

The reason why someone would delete those directories is that we are either testing migrations on the database or just want to have a clean slate to start working on it. However, with this current issue, we always have to remember and restart Docker.

Expected Behavior

After deleting the directory and restart the Aspire process, the directories should be usable again.

Steps To Reproduce

  1. At the very beginning of the AppHost project, create a directory by any means like Directory.CreateDirectory(path);
  2. Create any container resource like a SQL database, postgres, you name it. Anything that makes usage of WithBindMount() or WithDataBindMount() and map that created directory somewhere into the container;
  3. Run the app. You will see it start correctly;
  4. Stop the Aspire project and delete the directory;
  5. Run Aspire again;
  6. The directory will be created but Aspire resources that rely on it will fail to initialize. The error message vary depending on the container it runs but in a nutshell, it is because it "can't access" the directory which was just created;
  7. Stop Aspire delete the directory and restart Docker desktop;
  8. Restart Aspire again;
  9. The project gets back to work just fine.

Exceptions (if any)

No response

.NET Version info

.NET SDK:
 Version:           8.0.302
 Commit:            ef14e02af8
 Workload version:  8.0.300-manifests.e4379c3d
 MSBuild version:   17.10.4+10fbfbf2e

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.5
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.302/

.NET workloads installed:
 [aspire]
   Installation Source: SDK 8.0.300
   Manifest Version:    8.1.0/8.0.100
   Manifest Path:       /usr/local/share/dotnet/sdk-manifests/8.0.100/microsoft.net.sdk.aspire/8.1.0/WorkloadManifest.json
   Install Type:        FileBased

Host:
  Version:      8.0.6
  Architecture: arm64
  Commit:       3b8b000a0e

.NET SDKs installed:
  6.0.417 [/usr/local/share/dotnet/sdk]
  7.0.404 [/usr/local/share/dotnet/sdk]
  8.0.100 [/usr/local/share/dotnet/sdk]
  8.0.101 [/usr/local/share/dotnet/sdk]
  8.0.201 [/usr/local/share/dotnet/sdk]
  8.0.302 [/usr/local/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.25 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.14 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.2 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.6 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.25 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.14 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.6 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Anything else?

All latest Aspire and ASP.Net versions were used.

davidfowl commented 2 months ago

cc @karolz-ms @danegsta

karolz-ms commented 2 months ago

Hmm. @galvesribeiro can you reproduce the problem without Aspire? I mean something like this

  1. mkdir thedir
  2. docker run --mount type=bind,src=thedir,dst=/dst yourimage
  3. docker stop containerid;docker rm containerid
  4. rm -rf thedir
  5. Do no. 1 and no. 2 again.

Aspire does not really do anything more fancy than this for bind mounts.

galvesribeiro commented 2 months ago

Hello!

Found the actual exceptions on the AppHost:

fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create the container    {"Container": {"name":"localstack-wahkpvus"}, "Reconciliation": 4, "error": "docker command 'CreateContainer' returned with non-zero exit code 1: command output: Stdout: '' Stderr: 'Error response from daemon: invalid mount config for type \"bind\": bind source path does not exist: /<redacted>/.data/localstack\n'"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create the container    {"Container": {"name":"postgres-ngkrhpap"}, "Reconciliation": 9, "error": "docker command 'CreateContainer' returned with non-zero exit code 1: command output: Stdout: '' Stderr: 'Error response from daemon: invalid mount config for type \"bind\": bind source path does not exist: /<redacted>/.data/postgres\n'"}
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create the container    {"Container": {"name":"minio-vydntvdp"}, "Reconciliation": 10, "error": "docker command 'CreateContainer' returned with non-zero exit code 1: command output: Stdout: '' Stderr: 'Error response from daemon: invalid mount config for type \"bind\": bind source path does not exist: /<redacted>/.data/minio\n'"}

Then if I run this:

docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword \
-v /<redacted>/.data/postgres:/var/lib/postgresql/data postgres

It work just fine.

If at the same time I mix the manual docker run with the Aspire run, Aspire always fail while docker run always work.

In other words, that only happens when I'm running the AppHost.

Something seems to be weird on how DCP is interacting with Docker API and the mounts. With -v it works 100% of the time. Not sure about --mount and its parameters.

galvesribeiro commented 2 months ago

Just for reference, this directory is created before the resources are even built:

var builder = DistributedApplication.CreateBuilder(args);

var usernameParameter = builder.AddParameter("Postgres-User");
var passwordParameter = builder.AddParameter("Postgres-Password");
var localDataVolumePath = builder.AddParameter("Data-Path");
var localPostgresVolumePath = builder.AddParameter("Postgres-Path");
var staticResourcePath = builder.AddParameter("StaticResource-Path");

var minioUsernameParameter = builder.AddParameter("Minio-User");
var minioPasswordParameter = builder.AddParameter("Minio-Password");
var localMinioVolumePath = builder.AddParameter("Minio-Path");

var localLocalStackVolumePath = builder.AddParameter("LocalStack-Path");

DataDirectoryFactory.CreateDataDirectoryIfNotExists(localDataVolumePath.Resource.Value);

var localStack = builder.AddLocalStack("localstack", port: 4566)
    .WithDataBindMount(Path.Combine(localDataVolumePath.Resource.Value, localLocalStackVolumePath.Resource.Value));

var postgresCluster = builder
    .AddPostgres("postgres", usernameParameter, passwordParameter, port: 5432)
    .WithHealthCheck()
    .WithBindMount(
        Path.Combine(localDataVolumePath.Resource.Value, localPostgresVolumePath.Resource.Value),
        "/var/lib/postgresql/data"
    );

 // Rest of the resources here

And the directory is created as simple as this:

public static class DataDirectoryFactory
{
    public static void CreateDataDirectoryIfNotExists(string dataDirectory)
    {
        Directory.CreateDirectory(dataDirectory);

        var subdirectories = new[] { "postgres", "minio", "static-resources", "localstack" };

        foreach (var subdirectory in subdirectories)
        {
            var path = Path.Combine(dataDirectory, subdirectory);

            Directory.CreateDirectory(path);
        }
    }
}

Also it is not just one specific container. All of those which have the mounts are failing until we restart Docker.

danegsta commented 2 months ago

@galvesribeiro Based on the behavior you're describing, I'm suspicious that CreateDataDirectoryIfNotExists isn't always actually recreating the expected bind mount source folder. The main difference between -v and --mount is that, if given a bind mount source folder that doesn't exist, -v will create the missing folder on the host system but --mount will throw the error you're seeing in the AppHost logs. Can you add some logging and/or breakpoints to your AppHost to check what folder paths are being created (and that they exist on disk before the AppHost creates the container resources)?

The latest version of the Aspire workload includes updated logic to automatically create missing bind mount folders the same way -v would have done. Can you run dotnet workload list to see what workload version you're on?

danegsta commented 2 months ago

Also, as another thought, it looks like Docker Desktop on MacOS supports multiple different bind mount implementations; there's VirtioFS, gRPC FUSE, and osxfs (Legacy) listed as options. If you check the general settings in Docker Desktop, what implementation are you using?

galvesribeiro commented 1 month ago

@galvesribeiro Based on the behavior you're describing, I'm suspicious that CreateDataDirectoryIfNotExists isn't always actually recreating the expected bind mount source folder. The main difference between -v and --mount is that, if given a bind mount source folder that doesn't exist, -v will create the missing folder on the host system but --mount will throw the error you're seeing in the AppHost logs. Can you add some logging and/or breakpoints to your AppHost to check what folder paths are being created (and that they exist on disk before the AppHost creates the container resources)?

The latest version of the Aspire workload includes updated logic to automatically create missing bind mount folders the same way -v would have done. Can you run dotnet workload list to see what workload version you're on?

Sorry for the delay.


dotnet workload list

Installed Workload Id      Manifest Version      Installation Source
--------------------------------------------------------------------
aspire                     8.2.0/8.0.100         SDK 8.0.300        

I did checked with a breakpoint and the code as you see it indeed recreate the folders every time. It is also called very early so the container resources which uses it are not even defined yet:

image

@danegsta in regards to the file share implementation, I've never changed it and I'm using the default:

image

The thing is, that behavior doesn't exist when I'm doing the -v outside Aspire which makes it weird as you are saying DCP is not doing anything weird. Tried multiple ways to reproduce it with Docker alone without Aspire but I couldn't. It always works.

dbreshears commented 1 month ago

@galvesribeiro, do you have a repro project you can share? We haven't been able to reproduce this and haven't seen it reported elsewhere.

galvesribeiro commented 1 month ago

@dbreshears sure. Bear with me and I'll setup one and share here.