clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

Runtime does not lock when doing operation in a container #541

Open jcvenegas opened 7 years ago

jcvenegas commented 7 years ago

The runtime allow to be called concurrently to do an operation over a container. I 've seen this debuing a failed delete call. If starts fails docker calls delete twice. The runtime should lock until it finish to perform an operation over a container.

jodh-intel commented 7 years ago

Can you add some more details here? For the record, docker/containerd always call delete twice (unclear why).

jcvenegas commented 7 years ago

@jodh-intel I found this by testing this PR https://github.com/containers/virtcontainers/pull/360:

docker run -ti busybox blabla-this-binary-not-exist

  1. runtime will fail at start (expected)
  2. then delete will be called twice ( we already know that) , but see is being called without wait the next being finished.

    my point is because we allow to concurrently operations over the same container: it could happen that both operations interfere one each other, In this case we have two deletes working at the same time. The first one is ahead doing its job and the second fails in the middle of the execution because the first one did something that interfered the second one.

    //The file was removed by first delete
    2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory

    But we cannot ensure that for all the container operations interfere each other and not be able to do any of the concurrent operations. So I suggest to add a lock at runtime level to lock the container operations.

Complete log:

2017-09-12 13:27:57.46337096 -0500 CDT:17614:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [create --bundle /var/run/docker/libcontainerd/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 --console-socket /tmp/pty190167264/pty.sock --pid-file /run/docker/libcontainerd/containerd/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/init/pid f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:57.47730704 -0500 CDT:17614:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:57.492606096 -0500 CDT:17614:cc-runtime:info:No sockets from configuration
2017-09-12 13:27:57.496561604 -0500 CDT:17614:cc-runtime:info:Device details for container f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108: Major:0, Minor:79, MountPoint:/var/lib/docker/overlay2/361d17d60c7d2620c933016efbd80367e82c0b2af1f69dd8ea78f084dc782897/merged
2017-09-12 13:27:57.620956651 -0500 CDT:17614:cc-runtime:info:launching qemu with: [-name pod-f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 -uuid 66303230-3936-3263-3863-303737363164 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/monitor.sock,server,nowait -qmp unix:/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/ctrl.sock,server,nowait -m 2048M,slots=2,maxmem=16947M -smp 8,cores=8,threads=1,sockets=1 -device virtio-9p-pci,fsdev=ctr-9p-0,mount_tag=ctr-rootfs-0 -fsdev local,id=ctr-9p-0,path=/var/lib/docker/overlay2/361d17d60c7d2620c933016efbd80367e82c0b2af1f69dd8ea78f084dc782897/merged,security_model=none -device virtio-serial-pci,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-containers.img,size=235929600 -device virtserialport,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch0,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/hyper.sock,server,nowait -device virtserialport,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -chardev socket,id=charch1,path=/run/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/tty.sock,server,nowait -device virtio-9p-pci,fsdev=extra-9p-hyperShared,mount_tag=hyperShared -fsdev local,id=extra-9p-hyperShared,path=/tmp/hyper/shared/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108,security_model=none -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02 -netdev tap,id=network-0,ifname=tap0,downscript=no,script=no,vhost=on -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /usr/share/clear-containers/vmlinuz.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug iommu=off cryptomgr.notests net.ifnames=0 quiet systemd.show_status=false init=/usr/lib/systemd/systemd systemd.unit=clear-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket ip=::::::f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108::off::]
2017-09-12 13:27:57.659056288 -0500 CDT:17614:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:57.662759506 -0500 CDT:17614:cc-runtime:info:QMP version 2.7.0
2017-09-12 13:27:57.672895998 -0500 CDT:17614:cc-runtime:info:QMP capabilities []
2017-09-12 13:27:57.682631948 -0500 CDT:17614:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:57.693414689 -0500 CDT:17614:cc-runtime:info:{"return": {}}
2017-09-12 13:27:57.696808897 -0500 CDT:17614:cc-runtime:info:VM started
2017-09-12 13:27:58.011295778 -0500 CDT:17614:cc-runtime:info:Shim(s) started
2017-09-12 13:27:58.029708974 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.033173081 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.037073363 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.040472202 -0500 CDT:17614:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted
2017-09-12 13:27:58.05879619 -0500 CDT:17662:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [start f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.062213002 -0500 CDT:17662:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.249425831 -0500 CDT:17662:cc-runtime:error:Failed to start container: ERROR received from Hyperstart
2017-09-12 13:27:58.254840149 -0500 CDT:17662:cc-runtime:warning:failed to stop container: Container not running, impossible to stop
2017-09-12 13:27:58.258247264 -0500 CDT:17662:cc-runtime:error:ERROR received from Hyperstart
2017-09-12 13:27:58.266086762 -0500 CDT:17667:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.271840878 -0500 CDT:17667:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.308140883 -0500 CDT:17667:cc-runtime:info:Stopping shim PID 17658
2017-09-12 13:27:58.312071838 -0500 CDT:17667:cc-runtime:info:Shim(s) stopped
2017-09-12 13:27:58.315143626 -0500 CDT:17673:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [kill --all f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 SIGKILL]
2017-09-12 13:27:58.315800073 -0500 CDT:17667:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:58.318748872 -0500 CDT:17673:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.322177676 -0500 CDT:17667:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:58.328759555 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.332272603 -0500 CDT:17667:cc-runtime:info:{"execute":"quit"}
2017-09-12 13:27:58.334535304 -0500 CDT:17673:cc-runtime:error:Container f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108 not ready or running, cannot send a signal
2017-09-12 13:27:58.342213728 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.347704587 -0500 CDT:17678:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.354619106 -0500 CDT:17667:cc-runtime:info:{"timestamp": {"seconds": 1505240878, "microseconds": 342190}, "event": "SHUTDOWN"}
2017-09-12 13:27:58.365515308 -0500 CDT:17678:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.47854585 -0500 CDT:17667:cc-runtime:warning:Could not get veth interface: %sLinkByName() failed for veth name eth0: Link not found
2017-09-12 13:27:58.50687716 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory
2017-09-12 13:27:58.512799974 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.516380919 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.531651131 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted

Log filtrated by pid: First delete:

2017-09-12 13:27:58.266086762 -0500 CDT:17667:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.271840878 -0500 CDT:17667:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.308140883 -0500 CDT:17667:cc-runtime:info:Stopping shim PID 17658
2017-09-12 13:27:58.312071838 -0500 CDT:17667:cc-runtime:info:Shim(s) stopped
2017-09-12 13:27:58.315800073 -0500 CDT:17667:cc-runtime:info:{"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": ""}, "capabilities": []}}
2017-09-12 13:27:58.322177676 -0500 CDT:17667:cc-runtime:info:{"execute":"qmp_capabilities"}
2017-09-12 13:27:58.328759555 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.332272603 -0500 CDT:17667:cc-runtime:info:{"execute":"quit"}
2017-09-12 13:27:58.342213728 -0500 CDT:17667:cc-runtime:info:{"return": {}}
2017-09-12 13:27:58.354619106 -0500 CDT:17667:cc-runtime:info:{"timestamp": {"seconds": 1505240878, "microseconds": 342190}, "event": "SHUTDOWN"}
2017-09-12 13:27:58.47854585 -0500 CDT:17667:cc-runtime:warning:Could not get veth interface: %sLinkByName() failed for veth name eth0: Link not found
2017-09-12 13:27:58.50687716 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/memory mounted
2017-09-12 13:27:58.512799974 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/cpu mounted
2017-09-12 13:27:58.516380919 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/pids mounted
2017-09-12 13:27:58.531651131 -0500 CDT:17667:cc-runtime:info:cgroup path /sys/fs/cgroup/blkio mounted

Second delete

2017-09-12 13:27:58.347704587 -0500 CDT:17678:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit f5e72cb4f0b7110ef2fa2a37cf50b910a7cd18e2) called as: [delete f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108]
2017-09-12 13:27:58.365515308 -0500 CDT:17678:cc-runtime:info:Using configuration file "/etc/clear-containers/configuration.toml"
2017-09-12 13:27:58.506992996 -0500 CDT:17678:cc-runtime:error:open /var/lib/virtcontainers/pods/f020962c8c07761de954365cc7221026a478f3cbb6b1581474b5a0603d6e7108/config.json: no such file or directory
jcvenegas commented 7 years ago

I will check if runc does any kind of locking to avoid multiples runtime operations.

sboeuf commented 7 years ago

@jcvenegas I don't get why you run into this kind of issue since we have a mechanism locking the POD itself at the api.go level of virtcontainers. If you look into https://github.com/containers/virtcontainers/blob/master/api.go#L100-L104 or https://github.com/containers/virtcontainers/blob/master/api.go#L408-L412, you can see that we use a FLOCK to make sure we are not doing several operations on the same pod.