Closed jameslamb closed 3 months ago
I think that these changes are working as intended, but I'm struggling to test them.
I tested that the configuration value is getting all the way down to the place where it needs to be by adding a print statement, like this:
return fmt.Errorf("-- timeoutSeconds: %d", timeoutSeconds)
if time.Since(startTime) > (time.Second * time.Duration(timeoutSeconds)) {
...
And then doing this:
make clean testprep build
bin/canary validate \
--file ./examples/kubeflow.yaml \
--startup-timeout 5 \
container-canary/kubeflow:shouldpass
bin/canary validate \
--file ./examples/kubeflow.yaml \
--startup-timeout 9 \
container-canary/kubeflow:shouldpass
Error: -- timeoutSeconds: 5
Error: -- timeoutSeconds: 5
Error: -- timeoutSeconds: 9
Error: -- timeoutSeconds: 9
So this is at least as correct as the existing code.
But I'm not sure how to actually test the enforcement of the timeout. This startup timeout applies to the time between when docker run -d
returns, here:
and when the container is first reported as "running", here:
I'm not sure how to easily influence the amount of time between those two states. That does not include any of these:
docker run -d
completes)ENTRYPOINT
/ CMD
(happens after docker run -d
returns)The only things I can think of that cause this startup timeout to be exceeded are like this:
@jacobtomlinson @KyleFromNVIDIA can you think of a good way to reliably test this? Or should we just merge this as-is without testing changes?
This can be either a Go interface or a shim command that gets substituted in place of docker, or something else.
Thanks for these ideas, they're good ones I hadn't considered. A shim for docker
, like you did over in rapids-build-backend
for nvcc
(code link), sounds like it could be a powerful way to test more of how this tool interacts with the container runtime it's shelling out to.
I think that's way beyond my Go abilities though 😬
I agree with you about this being small and self-explanatory, and I'm also confident it could just be merged as-is. Let's see what @jacobtomlinson thinks.
Alright sounds good, thanks!
Fixes #61
Contributes to https://github.com/rapidsai/docker/issues/667
container-canary
runs a container and then executes commands inside it. When it starts up that container, if it doesn't see the container come up within 10 seconds, it raises an exception.This proposes making that timeout configurable, via a CLI argument
--startup-timeout
. This will make it easier to usecontainer-canary
with images / environments where container starting is slower (e.g. where something expensive is done at startup or where local disks are slow).