Open drewpca opened 3 years ago
I think if you run skaffold with --cleanup=false
flag then it won't clean up it's deployments.
@drewpca was using --cleanup=false
able to help you with this issue? Do you think we should special case this situation and add helpful tips to our output for this?
I already knew the fix, so I wasn't trying to diagnose my setup more. I'd rather focus on how skaffold can help me follow the chain of errors until I got to the right logs.
Here's a separate mistake I made the other day: my Dockerfile had CMD ["/opt/entrypoint.sh"]
but entrypoint.sh was not chmod executable. skaffold -v
debug showed me this:
DEBU[0011] Unknown waiting reason for container "nginx2": {&ContainerStateWaiting{Reason:RunContainerError,Message:failed to create containerd task: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/opt/entrypoint.sh\": permission denied": unknown,} nil nil}
but without -vdebug, the problem was again mysterious. Is it straightforward to make that one log line visible when it's a "starting container process caused..." type? I would think that's always going to be important to debugging.
I'd like a general policy that if there's an error with startup (like there was in both of these cases), skaffold either 1) shows that error, or 2) exits with advice on where to look next, which might be "rerun with -vdebug" or "rerun with --cleanup=false then look at kubectl logs".
Thanks for filing this issue. Sorry status check did not surface this issue.
My question for @tejal29: we currently only treat RunContainerError
as an error when the message indicates a docker failure:
https://github.com/GoogleContainerTools/skaffold/blob/aea70c4ca76e77f9c32929514e0685e33912877d/pkg/diag/validator/pod.go#L382-L386
https://github.com/GoogleContainerTools/skaffold/blob/aea70c4ca76e77f9c32929514e0685e33912877d/pkg/diag/validator/pod.go#L60
https://github.com/GoogleContainerTools/skaffold/blob/aea70c4ca76e77f9c32929514e0685e33912877d/pkg/diag/validator/pod.go#L43
Why not just flag all RunContainerErrors as an error
case runContainerError:
return proto.StatusCode_STATUSCHECK_RUN_CONTAINER_ERR, nil, fmt.Errorf("container %s in error: %s", c.Name, c.State.Waiting.Message)
}
Or perhaps we can add more to the errorPrefix
regexp:
errorPrefix = `(?P<Prefix>.*)(?P<DaemonLog>Error response from daemon\:|starting container process caused )(?P<Error>.*)`
Sounds good. When this RE was added it was catered/specific towards cloud run. I have no issues changing this to be more generic.
This was a mysterious exit from skaffold dev:
I don't know a way to read the k8s logs after skaffold has deleted the deployment. So I tried with vdebug:
This is still mysterious.
The issue was that I had no entrypoint/command line in Dockerfile nor any command set in deploy.yaml. Is it in-scope for skaffold to detect that? If not, is it reasonable for skaf to leave the bad deployment up for me to inspect, like "this didn't startup once, and the problem will be logged on the k8s side. Run kubectl logs ____ to see the error."
Information