Closed aaronlippold closed 4 years ago
The tricky part is that the ECS service update can get stuck for many reasons.
The human debugging steps are usually:
Kicking the tires here. One approach to improve the error reporting is to have ufo do what a person would normally do.
If the CloudFormation stuck at the ECS Service update step for more than, say 10m. Then ufo can provide a message with the ECS service events. It will likely contain the error or reason why CloudFormation is stuck. Ufo would just be doing the debugging steps via API instead of a person manually clicking.
Think the message should also link to some ufoships.com docs with an a more detailed explanation, helpful debugging tips, and some images. Have done something similar for Jets. Upon a failed jets deploy, a message prints out with this link: https://rubyonjets.com/docs/debugging/cloudformation/ It reduced folks opening issues and community forum posts on things that they can self-service. They just need some friendly directions.
The main process is the CloudFormation stack update during this. So to implement, think the possible reporting logic would happen in a Thread. The thread checks if the CloudFormation stack is in the ECS Service update step and is stuck. It will print to the same stdout since that's what the user is looking at the time.
It would also be nice if the ECS console service tab url was generated and printed out to the user for convenience.
Possible complexity
Think one quick fix is to actually print out a message a the beginning before the CloudFormation stack update telling the user to also check the ECS Console Service Events Tab. Maybe break it into 2 PRs, so there's a quick win.
Closing. If someone ends up handling this. Just post a comment.
When using both the EC2 and Fargate method, we should try to provide more detailed feedback on success and failure. If ECS is having issues rather than just seeming to halt, we should try to report more clearly so the user can self service.
https://github.com/tongueroo/ufo/issues/57 is a great example of where - I hope - we could have provided more guidance via the CLI.
Suggestions or thoughts on how we could improve the reporting?