dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.14k stars 301 forks source link

Make it easier to use external volumes for database containers #837

Closed Kralizek closed 2 months ago

Kralizek commented 6 months ago

I found my way around creating a database container with a volume mount

var database = builder.AddPostgresContainer("database")
    .WithVolumeMount("../../data/database/data", "/var/lib/postgresql/data");

It would be cool to have .WithVolumeMountForData("../../data/database/data") offered as an extension method to IResourceBuilder<PostgresContainerResource>

It would be even cooler if the libraries would push us towards the pit of success by assuming that we want to use a volume mount for the database data.

stbau04 commented 5 months ago

It would be even cooler if the libraries would push us towards the pit of success by assuming that we want to use a volume mount for the database data.

How could the library push you towards using the mount? Do you mean that the mount should be required when adding a postgres container and it would need to be specifically deactivated if not wanted?

Which of the optional parameters should be available for the extension method? Setting the VolumeMountType might make sense, but are there any cases where the data directory should be mounted readonly?

josephaw1022 commented 5 months ago

I like this idea. I think a holistic approach to this and having some sort of approach to do this for every supported component would be necessary for consistency (postgres, sql server, rabbitmq, etc..., and any other upcoming components). While some users may prefer not to use volumes by default, providing it by default could enhance usability for those who do.

If adding volumes by default is not feasible or not the approach the maintainers want to take, then introducing a more streamlined extension method would be a decent alternative. The current challenge I find when working with volumes is determining the appropriate volume path. An ideal solution would allow simply naming the volume without having to specify the path, significantly simplifying the process. Since aspire is opinionated with its container version for each component, it's not like the path is unknown or varying in any way.

Here's a comparison of the current method versus the proposed streamlined approach in code:

// Current method
var queue = builder.AddRabbitMQContainer("queue")
                   .WithVolumeMount("VolumeQueue", "/var/lib/rabbitmq", VolumeMountType.Named);

var accountDb = builder.AddPostgresContainer("accountDatabase")
                       .WithVolumeMount("VolumeMount.accountdb.data", "/var/lib/postgresql/data", VolumeMountType.Named)
                       .AddDatabase("account");

// Proposed streamlined method
var queue = builder.AddRabbitMQContainer("queue")
                   .WithNamedVolume("VolumeQueue");

var accountDb = builder.AddPostgresContainer("accountDatabase")
                       .WithNamedVolume("VolumeMount.accountdb.data")
                       .AddDatabase("account");

The proposed method would make the development experience more efficient and user-friendly, especially for those who frequently work with volumes.

stbau04 commented 5 months ago

If adding volumes by default is not feasible or not the approach the maintainers want to take, then introducing a more streamlined extension method would be a decent alternative. The current challenge I find when working with volumes is determining the appropriate volume path. An ideal solution would allow simply naming the volume without having to specify the path, significantly simplifying the process. Since aspire is opinionated with its container version for each component, it's not like the path is unknown or varying in any way.

I think adding the volumes by default is feasable by mounting the volume in the extension method for the IDistributedApplicationBuilder. To keep the possability of not mounting the default volumes we could add a optional parameter to the extensions which is either a bool addDefaultVolumeMounts: true or a enum (something like ContainerConfigurationLevel) providing the following values: None, VolumesMounted. Default would be VolumesMounted.

If ContainerConfigurationLevel.VolumesMounted is passed to the extension method the volumes are mounted as named with a auto-generated name.

The drawback of that approach would be that you are not forced to think of whether mounts might be necessary during implementing it (as it would be with an abstraction/inheritance approach, but that would probably be a lot more complicated to implement and use).

// Proposed streamlined method var queue = builder.AddRabbitMQContainer("queue") .WithNamedVolume("VolumeQueue");

var accountDb = builder.AddPostgresContainer("accountDatabase") .WithNamedVolume("VolumeMount.accountdb.data") .AddDatabase("account");



The proposed method would make the development experience more efficient and user-friendly, especially for those who frequently work with volumes.

This specific approach would probably not work very well as not all containers are going to have exactly one mount. There might be containers with no mounts or with more than one default volume required. In those cases an exception or some other soltuion would be required.

@davidfowl @danmoseley Do you think the default volume mounts are the right way to go? Or should we stick to the WithVolumeMountForData-approach @Kralizek mentioned?

DamianEdwards commented 5 months ago

I don't think we should mount volumes by default. I'd prefer we consider methods that expose the underlying container's supported paths and environment variables for the various built-in behaviors, including the data directories. There would need to be specific overloads of these for each container resource type of course, but they'd be the same name and shape for consistency, e.g.:

var accountDb = builder.AddPostgresContainer("accountDatabase")
    .WithDataVolume() // Automatically named volume mapped to Postgres-specific path for data, accepts optional volume name
    .WithInitDbBindMount("./data/accountDatabase") // Path to dir on host to mount to /docker-entrypoint-initdb.d
    .AddDatabase("account");
stbau04 commented 5 months ago

There are the following container resources (they inherit from ContainerResource):

Are PostgresServerResource and SqlServerDatabaseResource actually container resources? PostgresServerResource uses the type "postgres.server.v0" in the manifest while there is a method WriteContainer(ContainerResource container) in the ManifestPublishingContext which uses the type "container.v0". Maybe I'm just not getting it, but it seems a little bit off to me? For the SqlServerDatabaseResource it is pretty much the same (further e.g. the MongoDBDatabaseResource is not a container resource). Did I miss something? Why is SqlServerDatabaseResource a container resource while the mostly equal MongoDBDatabaseResource isnt?

Apart from that I would suggest the following extensions methods (i assume that the signature of different volume mounts may vary as they have completely different requirements but the signature should be consitant between the same method for different containers):

...
.WithDataVolumeMount() // Volume name is auto-generated, VolumeMountType is Named and it is not readonly
.WithDataVolumeMount(string source, VolumeMountType mountType =VolumeMountType .Named , bool isReadonly = false) // Configure the details yourself
.WithLogVolumeMount() // Volume name is auto-generated, VolumeMountType is Named and it is not readonly
.WithLogVolumeMount(string source, VolumeMountType mountType =VolumeMountType .Named) // Configure the details yourself, readonly setting is not available as it does not make any sense to make the log folder readonly
.WithInitDbBindVolumeMount(string source, VolumeMountType mountType =VolumeMountType .Bind, bool isReadonly = true) // No version without the source as it probably doesn't have any common use cases
...

I would implement these four function wherever applicable. Are there any further directories that could be commonly used? Did i miss any container resources?

Edit: Use With...VolumeMount instead of With...Volume/With...Mount to align with the existing WithVolumeMount

DamianEdwards commented 5 months ago

@stbau04 I think the existing WithVolumeMount is conflating different container concepts (bind-mount vs. volumes) so we should feel free to suggest changing it too.

DamianEdwards commented 5 months ago

For the SqlServerDatabaseResource it is pretty much the same (further e.g. the MongoDBDatabaseResource is not a container resource). Did I miss something? Why is SqlServerDatabaseResource a container resource while the mostly equal MongoDBDatabaseResource isnt?

This is a bit of a quirk of the latest changes to those resources and I expect this will change again. The non-container named resources now run as containers locally by default, but publish to the manifest as non-container resources.

stbau04 commented 5 months ago

@DamianEdwards

@stbau04 I think the existing WithVolumeMount is conflating different container concepts (bind-mount vs. volumes) so we should feel free to suggest changing it too.

Like two methods WithBindMount/WithVolume instead of the VolumeMountType enum? That would all in all add three additional methods if we want to support all possabilites i suggested in my prior comment

DamianEdwards commented 5 months ago

I don't think we'd need volume-flavored overloads for initdb support, that really only makes sense as a bind-mount if I understand correctly. For data and logs I'd assumed volumes were the primary scenario, but I can imagine some folks might want to bind to existing host directories for data, but seems a bit unlikely for logs?

All that said, having first-class methods for both volumes and bind-mounts for data, logs, and initdb would be fine I think.

stbau04 commented 5 months ago

This is a bit of a quirk of the latest changes to those resources and I expect this will change again. The non-container named resources now run as containers locally by default, but publish to the manifest as non-container resources.

Ok, so it probably does not make any sense to implement the extension methods for them as they are not deployed as container resources (would it even work?)

DamianEdwards commented 5 months ago

@stbau04 we're going to revisit it again as part of preview.3/preview.4 anyway, that's just how they're going to be in preview.2

stbau04 commented 5 months ago

Should i open a seperate issue for the WithBindMount/WithVolume stuff?

mitchdenny commented 5 months ago

Are PostgresServerResource and SqlServerDatabaseResource actually container resources? PostgresServerResource uses the type "postgres.server.v0" in the manifest while there is a method WriteContainer(ContainerResource container) in the ManifestPublishingContext which uses the type "container.v0". Maybe I'm just not getting it, but it seems a little bit off to me? For the SqlServerDatabaseResource it is pretty much the same (further e.g. the MongoDBDatabaseResource is not a container resource). Did I miss something? Why is SqlServerDatabaseResource a container resource while the mostly equal MongoDBDatabaseResource isnt?

Thanks for reporting these issues. Working on a fix now. PostgresServerResource should not derive from ContainerResource, and neither should SqlServerDatabaseResource.

mitchdenny commented 5 months ago

Just to expand a little bit on the differences between methods like AddPostgres(...) and AddPostgresContainer(...). The reason we have two methods is that when it comes to deployment, sometimes people will want to just defer to the deployment tool to create a resource using that target environments ideal implementation of that resource. For example in Azure, Postgres is probably best handled by the Azure Postgres Flexible Server resource type vs spinning up your own Postgres container and dealing with all the operational issues around doing that well.

Sometimes however, a developer wants to explicitly say that I want Postgres, and I want it implemented as a container that sits alongside the rest of my application code in ACA (continuing with the Azure example). You always need to think through whether that container is appropriate for production workloads though.

Kralizek commented 5 months ago

Just to expand a little bit on the differences between methods like AddPostgres(...) and AddPostgresContainer(...). The reason we have two methods is that when it comes to deployment, sometimes people will want to just defer to the deployment tool to create a resource using that target environments ideal implementation of that resource. For example in Azure, Postgres is probably best handled by the Azure Postgres Flexible Server resource type vs spinning up your own Postgres container and dealing with all the operational issues around doing that well.

Sometimes however, a developer wants to explicitly say that I want Postgres, and I want it implemented as a container that sits alongside the rest of my application code in ACA (continuing with the Azure example). You always need to think through whether that container is appropriate for production workloads though.

Maybe a bit off topic here, but Is this the right approach?

It feels like development and deployment concerns are clashing here: If I wear the developer hat, I care little how it will be deployed.

In my humble opinion, rather than two conflicting resource types or methods, we should have one resource (a development concept) that can be specialized to be deployed in one way or the other.

To me it feels like the Deployment strategy should be an (optional) metadata of the resource. It definitely should not be a decision point on which identical resource I use at dev time.

stbau04 commented 5 months ago

But there are some functions you can only use on container resources (e.g. WithVolumeMount). How would you handle them? Either they can be applied always and just fail if it is not possible or they can only be applied after setting the metadata e.g. through an .AsContainer extension method, which might result in a pretty similar structure as it is now

Further as @mitchdenny mentioned, AddPostgres and AddPostgresContainer have some differences (e.g. you need to think whether the container is fine to be deployed to production). ). If this is dependent on the metadata of the resource, it would mean that just a small method call might change the effective type of the resource completely.

mitchdenny commented 5 months ago

@Kralizek we used to just have AddPostgresContainer (as an example). And when it was written to the manifest it would be with a resource type of postgres.server.v0 - with very little other information because we assume that the deployment tool is going to provision it. The change we've made here is to allow people to exert a little bit of control over the deployment tool to say actually I do want this to be a container.

The alternative that we considered was not having AddPostgresContainer(...) at all, and just having AddPostgres(...) and if someone wanted a Postgres container that they could manipulate with volume mounts etc, then they would just use AddContainer(...) and build it up themselves.

The complication with that is that Aspire does some magic around connection string handling when you pass in the type via WithReference(...) to extract the connection string. A handle rolled container using AddContainer(...) wouldn't be able to participate in that mechanic because it doesn't have the right interfaces (because not all containers provide connection strings).

The API surface that we have isn't perfect, but we've tried it a few different ways and this is the one that felt the most ergonomic once all factors were considered. It is all still subject to change however and your feedback is noted.

stbau04 commented 5 months ago

I created a seperate issure for splitting the current WithVolumeMount into two methods (one for bind mounts and one for named volumes): https://github.com/dotnet/aspire/issues/1437

stbau04 commented 5 months ago

What should the name for automatically generated volumes be? In my first approach i used a guid (as @josephaw1022 did in his PR; there are two PRs now?!). But the random guid would change each time the method is executed (at least while developing this means a new volume on each start, i am not sure what happens when the project is deployed).

We could either persist the guids somewhere if a container way already created, but i don't think that this is actually the way to go? Another approach could be to generate the volume dependent on some sort of assembly id and the resource name, but then we could only run the application once per host (otherwise they would share volumes). I don't actually like any of these approaches, but i couldn't figure out anything else.

How could we solve this one?

josephaw1022 commented 5 months ago

Wrote a long-winded explanation explaining giving my thought process on this https://github.com/dotnet/aspire/pull/1447#issuecomment-1859367388

TLDR of comment

In my opinion, the simplest approach to this would have the developers write a name for the volumes like this

var db1 = builder.AddPostgresContainer("db1")
                       .WithNamedVolume("db1.volume");

var db2 = builder.AddPostgresContainer("db2")
                       .WithNamedVolume("db2.volume");

The developers would need to name the volume, but they wouldn't need to memorize or look up any volume paths with this approach.

However, the case where two volumes have the same name needs to be considered and thought through for this to work.

Currently if you were to run this


var volumeName = "VolumeMount.sqlserver.data";

var database = builder.AddSqlServerContainer("sqlserver", sqlpassword)
    .WithVolumeMount(volumeName, "/var/opt/mssql", VolumeMountType.Named)
    .AddDatabase("appdb");

var anotherDatabase = builder.AddSqlServerContainer("sqlserver2", sqlpassword)
    .WithVolumeMount(volumeName, "/var/opt/mssql", VolumeMountType.Named)
    .AddDatabase("anotherdb");

builder.Build().Run();

You should see this error

image

This error seems to be more of a symptom exception rather than an exception being thrown at the root cause (duplicate volume names).

So, I think having some custom error being thrown for two container resources using the same volume name would need to be implemented so that if someone were to write


var db1 = builder.AddSqlServerContainer("sqlserver", sqlpassword)
    .WithNamedVolume("newway")
    .AddDatabase("appdb");

var db2 = builder.AddSqlServerContainer("sqlserver2", sqlpassword)
    .WithNamedVolume("newway")
    .AddDatabase("anotherdb");

// or 

var db3 = builder.AddSqlServerContainer("sqlserver3", sqlpassword)
    .WithVolumeMount("oldway", "/var/opt/mssql", VolumeMountType.Named)
    .AddDatabase("appdb");

var db4 = builder.AddSqlServerContainer("sqlserver4", sqlpassword)
    .WithVolumeMount("oldway", "/var/opt/mssql", VolumeMountType.Named)
    .AddDatabase("anotherdb");

builder.Build().Run();

they would then immediately know they have a duplicate volume name.

And now that I think about it, adding more specific errors for duplicate volume names doesn't necessarily need to be done for this new syntactic sugar to be added as it's still an issue without the syntactic sugar. But, I am curious on if we wanted just the syntactic sugar done now and do something later for the duplicate volume names exception issue or do we want that solved in the same pr or done before the syntactic sugar pr?

stbau04 commented 5 months ago

@josephaw1022 In my opinion we should probably create a seperate issue/PR for that as this is a somewhat different topic. Further it is not approved by a team member that this error is something we want

stbau04 commented 4 months ago

Probably we should wait for https://github.com/dotnet/aspire/pull/1521 and then see how it affects this issue (especially the volume/bind mount stuff)

davidfowl commented 4 months ago

cc @JamesNK

davidfowl commented 3 months ago

Moving to next preview.

davidfowl commented 2 months ago

@eerhardt We're going to go with @stbau04's PR on this.