Closed karolz-ms closed 1 week ago
Have you considered having dcp return specific error codes for certain situations? For example, return exit code X if Docker isn't healthy and then the host can check for known error codes and print a friendly error message to the host console. It would look better than an error type + message + stack trace.
The fallback for unknown exit codes would be what you suggested and include dcp output in the host console.
Agreed, I do not think we want the stack trace here.
We could consider having specific error codes for common issues, but ultimately DCP should be able to tell the user what part of dependency check failed, and what the user might want to try to get themselves unblocked.
Some hosting tests fail randomly on CI with Application orchestrator dependency check had an unexpected error System.TimeoutException: The operation has timed out.
but no other information in the log. This issue - https://github.com/dotnet/aspire/issues/4640 - would also benefit from getting the details in the log.
Wait is this a dupe of https://github.com/dotnet/aspire/issues/4089?
cc @danegsta
It was thought to be fixed. But I got an unhelpful error message when testing locally with latest main source code.
FYI, this is the result from dcp info:
{"version":"0.5.4","commitHash":"a2453505df8d60a288a1fdb30004e11ca8854b22","buildTimestamp":"2024-06-20T22:49:08Z","containers":{"runtime":"docker","hostName":"host.docker.internal","installed":true,"running":false,"error":"Docker CLI timed out while checking status. Ensure Docker CLI is functioning correctly and try again."}}
Could it have been fixed in dcp but dcp wasn't yet updated on dotnet/aspire main?
@JamesNK was your Docker stuck in resource saver mode when you originally hit the exit code scenario? I’ve been trying to reproduce with various states of Docker installation/running states, but haven’t been able to get an exit code response yet.
Yes, it was in resource saver mode. I had to restart my machine to make it healthy.
I just did that!
Okay, my money is on us not explicitly handling a cancellation error somewhere and letting it bubble all the way up.
This looks to be a simple mismatch between timeouts; DCP has a 30 second timeout configured for Docker commands, while Aspire has a 25 second timeout on the info command. We're returning a non-zero exit code because Aspire kills the DCP info check before it completes.
I've got a DCP PR in progress to reduce our default timeout on the info command + add a config flag that Aspire can use to override the timeout. Implementing that flag in Aspire would have to wait for a future release though, since it won't be supported in the 8.0 DCP release.
An updated DCP build has been merged with info timeout changes that should fix the cause of this particular non-zero exit code, but it's also worth improving the timeout logic in Aspire to provide a better error message (we probably shouldn't be throwing an exception reporting that a process exited unexpectedly when we're the ones killing the process).
If the dependency check that uses
dcp info
command fails, withdcp
returning non-zero code, an exception is thrown. The way we process this exception here https://github.com/dotnet/aspire/blob/main/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs#L105 cause all output of thedcp
command to be discarded.The result is the user only sees a message that says "dcp info returned non-zero exit code", which is not helpful:
We should instead include all the output from
dcp
in the log, so that the user is informed about why the dependency check failed.