dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.73k stars 430 forks source link

Control startup order / readiness check for services #921

Closed aL3891 closed 1 month ago

aL3891 commented 10 months ago

Firstly, so awesome to see this project, i was real sad to see project tye get fewer and fewer updates because i found it really useful and perhaps took a larger dependency on it that you should for a preview project :)

For tye and our current, rather hacky local development orchestration we have an issue where some services need to be running for other services to start correctly. It would be great if aspire started the projects in the order they where defined and also had the option to wait for health checks to pass before starting the next service

Perhaps this is already the case , if so, my appologies, but i didnt find anything in the docs about that. i guess it should already have the data to work like this since the services are defined in sequence in code and also health checks and dependencies can be defined. i guess some services wont have health checks in the same sense, like a database container for example, so maybe they need to have custom method supplied to see if they are ready

davidfowl commented 10 months ago

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

aL3891 commented 10 months ago

I agree that that is not the typical case in production but when running locally i dont think it uncommon for the database to not be running when you start, where as in production that would be very unusual. Likewise starting your entire service solution from scratch is less common in production where you'd typically have the previous version already running, so services would still start.

I admit i'm a bit selfish here though, trying to solve an issue that i personally have, specifically with orleans where clients crash if there is no silo available. Ideally apps should be resilient to things like that but there are alot of non ideal apps out there. Since the information about dependencies is already it seems friction could be reduced by using it :)

I guess the issue could be mitigated by having restart policies for services, that typically does exist in production but is not something that aspire does at the moment as far as i understand

Having the services restart could be a bit disruptive when debugging locally though, where as in production it probably doesn't matter as much.

3GDXC commented 9 months ago

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

@davidfowl how would you suggest developers control the start-up/time-out/delay where a project has a dependency on a container and this is specified in the AppHost project? i.e. two projects that use a RabbitMQContainer, one publisher, one consumer, both require the RabbitMQContainer to be running prior to starting; at present a hack of await Task.Delay(xx) works; but as I'm sure you'll agree that is a nasty hack ;)

davidfowl commented 9 months ago

In the next release, these startup errors will go away for the most part because we have a proxy between services that we can make work without your code having to retry or delay.

That said, aspire components are resilient to connection failures by default and retry connections for transient failures (up to some limit). In a way it makes sure you are building with an understanding that the network can fail at any time and should recovery from transient errors.

jsheetzmt commented 8 months ago

It would be nice if we can toggle containers created in AppHost as a required dependency before projects are started. For us, a SQL container is only applicable for local dev. When deployed, our apps are targeting Azure SQL and don't have the same issue. Migrations run in our CI/CD pipeline, and we will retry the pipeline if it fails.

diegomgarcia commented 8 months ago

I haven't look deeply into the internals yet, but I believe it might be feasible to implement a wait mechanism for services added with the WithReference method. Essentially, this would ensure these services are fully ready before initiating the dependent service they're connected to. Or it could be viable to create an extension like DependsOn(service) that we could handle on this specific way to avoid causing a slow down on all services referenced to finish starting.

davidfowl commented 5 months ago

Here's a spike for this feature based on the latest preview (6 at the time of writing) https://github.com/davidfowl/WaitForDependenciesAspire

cisionmarkwalls commented 3 months ago

Here's a spike for this feature based on the latest preview (6 at the time of writing) https://github.com/davidfowl/WaitForDependenciesAspire

I've tested it in our applications (used it to wait for database setup/seeding across Postgres, Redis and Kafka topics before services start up so they are all in a known state) for local development and it worked really well. Have you considered spinning that out into a nuget as an optional Aspire feature for local development?

SteveSandersonMS commented 3 months ago

Adding to this just to point out how helpful it would be for #4177. Local LLM hosting may involve downloading many-gigabyte models before startup if not already cached locally, which can easily take 10+ minutes depending on connection speed. If developers don't realise this is going on, the only indication they might get is errors from dependent projects that are trying to call the local LLM service. They will likely start debugging and rechecking configuration, and may shut down the AppHost many times in the process which stops the download, when all they need to do is just wait.

Similarly, waiting for readiness could be considered a prerequiste for a data seeding mechanism. For example, seeding a Qdrant vector DB can take a minute or two even with just 10k records or so. If you don't realise what's going on and just see it as errors, you'll waste a lot of time debugging.

jhancock-taxa commented 3 months ago

This happens all of the time with integration tests. We need something that causes blocking on database spin up/readiness etc. for development which can be ignored in production: .WaitForReadyDev() or something so it's really obvious what it's doing.

feO2x commented 3 months ago

I'd like to second that. WaitFor, as shown in the WaitForDependenciesAspire example, would be extremely helpful. Docker Compose has a similar mechanism with depends_on and, in my opinion, not having this in Aspire could be a deal breaker for people coming from Docker Compose or similar technologies.

gingters commented 2 months ago

Yes, this is required here too.

When I use docker compose and depends_on, the container startup will wait until the dependency is up and running. When a container has a configured healthcheck, then the dependent containers will only start up when the healthcheck of the dependency returns fine.

That way, i.e. a migrations container can wait until the database is up and running, the API can wait for the migrations and the container that has the data seed script (which uses the API) can wait until the API reports back healthy, and all other services that need the data can wait for the seed container to be started up.

So, this "wait for a certain service to report back healthy until this is started" just like in docker compose would be really great. There are lot of scenarios that would be enabled with that functionality.

JeroMiya commented 2 months ago

We also possibly have a need for this. For local development, we have to incorporate a custom identity server container whose implementation is not entirely under our control. It does not participate directly with aspire service discovery other than injecting connection strings to the database. This identity server implementation reads configuration data from the database on startup (outside of our control), so it must be able to connect to the database on startup. The database container does have a health check defined, and we use a docker compose dependency to ensure the database container is up, running, and healthy before starting the identity server container.

We could probably get away with using a longer retry/timeout setting for the connection string (although this would be more fragile in practice than waiting for a health check), but the default Aspire behavior when adding a SqlServer resource reference is to generate a connection string without any retry/timeout settings. So, we'd have to customize the connection string, and that's a bit more complicated than it needs to be just to support (unavoidable) resource dependencies for local development.

rosieks commented 2 months ago

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

@davidfowl So what's the solution for running migrations right now? I tried to sample from playground but I stuck:

Npgsql.PostgresException (0x80004005): 57P03: the database system is starting up
onionhammer commented 2 months ago

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy.

Unless you're using initcontainers, which EF migrations is a perfect example of when you might use init containers. Init containers are also a thing in azure container apps

jhancock-taxa commented 2 months ago

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy.

Unless you're using initcontainers, which EF migrations is a perfect example of when you might use init containers. Init containers are also a thing in azure container apps

They're also a thing in Kubernetes and every other cloud platform that does orchestration. Aspire needs to be able to model these.

mitchdenny commented 1 month ago

Closing this in favor of: https://github.com/dotnet/aspire/issues/5275