att-comdev / promenade

This project has moved to OpenStack.
https://www.airshipit.org/
Apache License 2.0
11 stars 15 forks source link

Add promenade-api container error reporting for missing client certificates #50

Closed craiganderson closed 6 years ago

craiganderson commented 6 years ago

Running genesis.sh script completed successfully (reported no errors), but one container had an issue:

ucp promenade-api-6696769cd-qwpzf 0/1 ImagePullBackOff 0 10h

Trying to get more details:

kubectl logs promenade-api-6696769cd-qwpzf --namespace=ucp Error from server (BadRequest): container "promenade-api" in pod "promenade-api-6696769cd-qwpzf" is waiting to start: trying and failing to pull image

More verbose container logging would be useful, in particular the error code that might explain why the image couldn't be pulled (DNS resolution error, timeout error, or other such error). In this case, the issue was that the host system did not have the right host certificate needed for a secure download of the artifact from an internal HTTPS mirror.

mark-burnett commented 6 years ago

Hi @craiganderson,

I'm not sure there's a good path to getting more specific errors here. We might be able to look into some upstream error message improvements (not sure where the invalid cert message is being lost).

I guess the alternative would be to monitor docker/kubelet logs and scrape out such messages, though I would expect the normal fluentd -> elasticsearch pipeline to handle that sort of thing.

I'm going to close this one, since I don't see a very clean path forward, but if you think of something, please re-open :)

craiganderson commented 6 years ago

Ok yes, this error is actually at the kubernetes level. https://github.com/kubernetes/kubernetes/issues/30148

Looks like the more detailed kubectl describe pod would have given me a more detailed failure reason. I'll update this in the troubleshooting docs.