Closed jodh-intel closed 6 years ago
@sboeuf - do you want to take a look at this one?
@jodh-intel I have some high priority work to do on the agent, but as soon as I am available, I can take a look. But you should try to take a look in the meantime.
@jodh-intel I have just gave this a try and I found where the problem is. My last changes on virtcontainers need to rely on the latest agent. Unfortunately, the agent changes have not landed yet on a new clear-containers.img.... If you build the agent yourself from the master of the agent repo, and you replace the agent binary inside your clear-containers.img, everything works perfectly. Please give this a try and let me know how it goes.
cc @mcastelino @sameo
After quite a bit of debugging (confused by the fact that creating a custom agent using osbuilder
was behaving differently due to newer code :), I think the minimal fix is the following but I'd like your thoughts on this:
diff --git a/container.go b/container.go
index 3c12f23..981b3cb 100644
--- a/container.go
+++ b/container.go
@@ -677,6 +677,11 @@ func (c *Container) kill(signal syscall.Signal, all bool) error {
return c.setContainerState(StateStopped)
}
+ if state.State == StateStopped {
+ // already shutdown
+ return nil
+ }
+
if state.State != StateRunning {
return fmt.Errorf("Container not running, impossible to signal the container")
}
It also looks like I'm also going to have to rework #483 to change the way DeletePod()
works due to recent vc + agent changes. I can't wait for the (pod-specific) proxy instance to exit any more due to https://github.com/clearcontainers/agent/commit/25e604732f8ef2b3d8d207681eff06a3b57d4b54 so will either need to tell it to shutdown from vc, or just kill it (but not wait as vc/runtime won't be the parent and cannot therefore handle the scenario as the non-parent wait options won't work reliably in this scenario).
It looks like the above patch is somehow miraculously no longer needed. Re-testing by re-vendoring the latest virtcontainers changes into the latest runtime with my vc changes on top and using the latest agent via osbuilder, it now all works! I can only assume there was some cruft left over somewhere when I was testing the changes at the runtime level. Or fairy dust, cosmic rays, pixies or trolls. Or aliens.
Closing now that https://github.com/containers/virtcontainers/pull/483 is behaving.
:alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien::alien:
@jodh-intel do we still need to rebuild the image?
As part of landing #483, I had to create a test runtime branch which included:
Testing locally showed that if the user runs:
even though the shim and hypervisor shut down,
docker run
will hang.The relevant lines from the journal:
The problem appears to be commit 711783f547cb679c698c2081a5a110648282cda6 from #482.