Open jcvenegas opened 7 years ago
Can you add some more details here? For the record, docker/containerd always call delete
twice (unclear why).
@jodh-intel I found this by testing this PR https://github.com/containers/virtcontainers/pull/360:
docker run -ti busybox blabla-this-binary-not-exist
start
(expected) then delete will be called twice ( we already know that) , but see is being called without wait the next being finished.
my point is because we allow to concurrently operations over the same container: it could happen that both operations interfere one each other, In this case we have two deletes working at the same time. The first one is ahead doing its job and the second fails in the middle of the execution because the first one did something that interfered the second one.
//The file was removed by first delete
2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory
But we cannot ensure that for all the container operations interfere each other and not be able to do any of the concurrent operations. So I suggest to add a lock at runtime level to lock the container operations.
Complete log:
2017-09-12 13:27:57.46337096 -0500 CDT:17614:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [create --bundle /var/run/docker/libcontainerd/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 --console-socket /tmp/pty190167264/pty.sock --pid-file /run/docker/libcontainerd/containerd/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/init/pid f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:57.47730704 -0500 CDT:17614:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:57.492606096 -0500 CDT:17614:cc-runtime:info:No sockets from configuration
2017-09-12 13:27:57.496561604 -0500 CDT:17614:cc-runtime:info:Device details for container f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108: Major:0, Minor:79, MountPoint:/var/lib/docker/overlay2/361d17d60c7d2620c933016efbd80367e82c0b2af1f69dd8ea78f084dc782897/merged
2017-09-12 13:27:57.620956651 -0500 CDT:17614:cc-runtime:info:launching qemu with: [-name pod-f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 -uuid 66303230-3936-3263-3863-303737363164 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/monitor.sock,server,nowait -qmp unix:/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/ctrl.sock,server,nowait -m 2048M,slots=2,maxmem=16947M -smp 8,cores=8,threads=1,sockets=1 -device virtio-9p-pci,fsdev=ctr-9p-0,mount_tag=ctr-rootfs-0 -fsdev local,id=ctr-9p-0,path=/var/lib/docker/overlay2/361d17d60c7d2620c933016efbd80367e82c0b2af1f69dd8ea78f084dc782897/merged,security_model=none -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-containers.img,size=235929600 -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/tmp/hyper/shared/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108,security_model=none -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02 -netdev tap,id=network-0,ifname=tap0,downscript=no,script=no,vhost=on -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 quiet systemd.show_status=false init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108::off::]
2017-09-12 13:27:57.659056288 -0500 CDT:17614:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:57.662759506 -0500 CDT:17614:cc-runtime:info:QMP version 2.7.0
2017-09-12 13:27:57.672895998 -0500 CDT:17614:cc-runtime:info:QMP capabilities []
2017-09-12 13:27:57.682631948 -0500 CDT:17614:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:57.693414689 -0500 CDT:17614:cc-runtime:info:{"return": {}}
2017-09-12 13:27:57.696808897 -0500 CDT:17614:cc-runtime:info:VM started
2017-09-12 13:27:58.011295778 -0500 CDT:17614:cc-runtime:info:Shim(s) started
2017-09-12 13:27:58.029708974 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.033173081 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.037073363 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.040472202 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted
2017-09-12 13:27:58.05879619 -0500 CDT:17662:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [start f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.062213002 -0500 CDT:17662:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.249425831 -0500 CDT:17662:cc-runtime:error:Failed to start container: ERROR received from Hyperstart
2017-09-12 13:27:58.254840149 -0500 CDT:17662:cc-runtime:warning:failed to stop container: Container not running, impossible to stop
2017-09-12 13:27:58.258247264 -0500 CDT:17662:cc-runtime:error:ERROR received from Hyperstart
2017-09-12 13:27:58.266086762 -0500 CDT:17667:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.271840878 -0500 CDT:17667:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.308140883 -0500 CDT:17667:cc-runtime:info:Stopping shim PID 17658
2017-09-12 13:27:58.312071838 -0500 CDT:17667:cc-runtime:info:Shim(s) stopped
2017-09-12 13:27:58.315143626 -0500 CDT:17673:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [kill --all f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 SIGKILL]
2017-09-12 13:27:58.315800073 -0500 CDT:17667:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:58.318748872 -0500 CDT:17673:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.322177676 -0500 CDT:17667:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:58.328759555 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.332272603 -0500 CDT:17667:cc-runtime:info:{"execute":"quit"}
2017-09-12 13:27:58.334535304 -0500 CDT:17673:cc-runtime:error:Container f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 not ready or running, cannot send a signal
2017-09-12 13:27:58.342213728 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.347704587 -0500 CDT:17678:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.354619106 -0500 CDT:17667:cc-runtime:info:{"timestamp": {"seconds": 1505240878, "microseconds": 342190}, "event": "SHUTDOWN"}
2017-09-12 13:27:58.365515308 -0500 CDT:17678:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.47854585 -0500 CDT:17667:cc-runtime:warning:Could not get veth interface: %sLinkByName() failed for veth name eth0: Link not found
2017-09-12 13:27:58.50687716 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory
2017-09-12 13:27:58.512799974 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.516380919 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.531651131 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted
Log filtrated by pid: First delete:
2017-09-12 13:27:58.266086762 -0500 CDT:17667:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.271840878 -0500 CDT:17667:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.308140883 -0500 CDT:17667:cc-runtime:info:Stopping shim PID 17658
2017-09-12 13:27:58.312071838 -0500 CDT:17667:cc-runtime:info:Shim(s) stopped
2017-09-12 13:27:58.315800073 -0500 CDT:17667:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:58.322177676 -0500 CDT:17667:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:58.328759555 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.332272603 -0500 CDT:17667:cc-runtime:info:{"execute":"quit"}
2017-09-12 13:27:58.342213728 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.354619106 -0500 CDT:17667:cc-runtime:info:{"timestamp": {"seconds": 1505240878, "microseconds": 342190}, "event": "SHUTDOWN"}
2017-09-12 13:27:58.47854585 -0500 CDT:17667:cc-runtime:warning:Could not get veth interface: %sLinkByName() failed for veth name eth0: Link not found
2017-09-12 13:27:58.50687716 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.512799974 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.516380919 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.531651131 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted
Second delete
2017-09-12 13:27:58.347704587 -0500 CDT:17678:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.365515308 -0500 CDT:17678:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory
I will check if runc does any kind of locking to avoid multiples runtime operations.
@jcvenegas I don't get why you run into this kind of issue since we have a mechanism locking the POD itself at the api.go level of virtcontainers. If you look into https://github.com/containers/virtcontainers/blob/master/api.go#L100-L104 or https://github.com/containers/virtcontainers/blob/master/api.go#L408-L412, you can see that we use a FLOCK to make sure we are not doing several operations on the same pod.
The runtime allow to be called concurrently to do an operation over a container. I 've seen this debuing a failed delete call. If starts fails docker calls delete twice. The runtime should lock until it finish to perform an operation over a container.