Open jcvenegas opened 6 years ago
I tested the image locally using this tests and the tests is passing. Btw seems that the tests is wrong with the new agent changes and older we never pass the device but the exec commend from cri pass because not return an exit code different to zero then try to match for /dev/mynull
. The cri command match but actually the fails.
Stdout:
Stderr:
ls: /dev/mynull: No such file or directory
Exit code: 1
Note: It fails for both for any agent/image because we never send that device to be created in the container.
ok 12 ctr execsync
not ok 13 ctr device add
# (in test file crio.bats, line 530)
# `[[ "$output" =~ "/dev/mynull" ]]' failed
# 0
# time="2018-01-04 21:24:12.206637296Z" level=debug msg="[graphdriver] trying provided driver "devicemapper""
# time="2018-01-04 21:24:12.207340593Z" level=debug msg="devicemapper: kernel dm driver version is 4.35.0"
# time="2018-01-04 21:24:12.207451992Z" level=debug msg="devmapper: Generated prefix: container-8:1-1032959"
# time="2018-01-04 21:24:12.207471692Z" level=debug msg="devmapper: Checking for existence of the pool container-8:1-1032959-pool"
# time="2018-01-04 21:24:12.207642791Z" level=debug msg="devmapper: Pool doesn't exist. Creating it."
# time="2018-01-04 21:24:12.296651734Z" level=debug msg="devmapper: loadDeviceFilesOnStart()"
# time="2018-01-04 21:24:12.296739434Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/0492ba3dc6085a828f8fb8caa81aa63d228652512cd31db515447f94153cf9a2"
# time="2018-01-04 21:24:12.296863233Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/0ef4f15a6e70fde8a60bd744ba793ca475e5c0db8f9e4c40ab61b4fd7d19a03e"
# time="2018-01-04 21:24:12.296891233Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/1fac3116efcc18304d292ceeb01d14a8be87f5ed62b6fe807ad3674edca66212"
# time="2018-01-04 21:24:12.296913133Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/25a11eda63ca7bf7a51fd215b93b64ef1aade849d20a8ae5817c8d2cb092b3d7"
# time="2018-01-04 21:24:12.296934133Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/5471cf527711c5f3206bdfdd8de43336917a1202786bfed22cad6fc834e021c3"
# time="2018-01-04 21:24:12.296954132Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/618033da5ea1cb3ee1e52365f9ae4bf5319f464744c7706cf8d243974abd82c1"
# time="2018-01-04 21:24:12.296975432Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/72810732caf19844a8658b3bc2497023c9be857bdd880672db3b8f03b51d1018"
# time="2018-01-04 21:24:12.296995432Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/74f49e9c31bbd1145c97a0e4ed420f137b5434e33a24ceb32a6a30c98a280754"
A new PR was send to agent repository https://github.com/clearcontainers/agent/pull/188 But seems that the crio tests are failing randomly
not ok 11 ctr execsync conflicting with conmon flags parsing
# (in test file crio.bats, line 476)
# `[ "$status" -eq 0 ]' failed
# 0
# time="2018-01-05 01:23:30.945101006Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false"
# time="2018-01-05 01:23:30.945152706Z" level=info msg="[graphdriver] using prior storage driver: overlay"
# time="2018-01-05 01:23:30.946779979Z" level=info msg="CNI network crionet (type=bridge) is used from /tmp/tmp.PEhMeoU8La/cni/net.d/10-crio.conf"
# time="2018-01-05 01:23:30.947451969Z" level=info msg="CNI network crionet (type=bridge) is used from /tmp/tmp.PEhMeoU8La/cni/net.d/10-crio.conf"
# time="2018-01-05 01:23:30.956324627Z" level=debug msg="seccomp status: true"
# time="2018-01-05 01:23:30.956828219Z" level=debug msg="Golang's threads limit set to 115020"
# time="2018-01-05 01:23:30.957669905Z" level=debug msg="sandboxes: []"
I have tried to reproduce (many attempts) but both with image 19790 and 20120, I don't have any CRI-O issue. Maybe because I am trying inside a VM running on Azure ?
About the PR clearcontainers/agent#188, it's actually an expected failure since we don't fallback to a minimal set of capabilities in case we don't provide those caps from the runtime. And we don't have the ability to provide them yet. That's why I have asked @amshinde to handle the case where nothing is provided to apply a minimal set of caps, otherwise this PR will break Clear Containers.
@sboeuf thanks for clarify about clearcontainers/agent#188.
I also tested with image 20110 but not issues. I am testing locally in a cclouvm instance running ubuntu 16.04 but I can not reproduce it, actually the only place the issue is "reproducible" is the that changes the image https://github.com/clearcontainers/runtime/pull/895
@jcvenegas could you investigate by enabling devicemapper. It might be a way to reproduce more easily.
@sboeuf yes, I will try it
No, locally I can not reproduce it devicemapper
@jcvenegas Going to take a look at this. Have you been able to reproduce this locally at all?
I have been trying to reproduce this issue, but havent been able to do so. My intitial investigation:
This test passes a device "/dev/null" to the container, specifying the path inside the container for the device as "/dev/mynull". The test then checks for the exit code of running "ls /dev/mynull" using crioctl ctr execsync and if "/dev/mynull" exists in the output string.
There are 2 issues here. First this test should have never passed in the first place. The "/dev/null" is a character device, something that we dont handle with Clear Containers(only block and vfio devices are handled). This means that that the exit code from crioctl ctr execsync is not correct. I am not sure if crioctl does not propogate the error correctly, since a quick test with docker using the above device passing scenarion gives a non-zero exit code.
If we ignore the above, looks like the test is failing while matching the string "/dev/mynull" in the output. We should always get "/dev/mynull': No such file or directory" in the output, but this not received occassionally going by the logs above. The shim may have received the exit code without receiving the above message on the tty channel. @sboeuf I wonder if your changes for subreaping processes caused any side effects. Will look further.
Image 20120 added new agent version The agent changed from 4d844b2 to d9790 https://download.clearlinux.org/releases/20120/clear/RELEASENOTES
Here is the list of changes merged: ad7c8a5 privileges: Add NoNewPrivileges in the api 4c2b752 devices: Bind mount devices only when /dev is expected to be bindmounted d38fb19 dev: Listen to udev events and bind mount these to container's /dev bd925f1 agent: Make the agent the subreaper of all processes cf05dda vendor: Update libcontainer vendoring 4dac904 dev: Mount /dev from tmpfs before mount namespace is created
Using this new agent is making fail cri-o tests
not ok 13 ctr device add
It is failing trying to check for a device in /dev/mynull https://github.com/clearcontainers/tests/blob/7cbbda93b7f8399b6e76fdd541d664ddb45ea3bd/integration/cri-o/crio.bats#L530The file is created based on this config file: https://github.com/kubernetes-incubator/cri-o/blob/6b91df3da7ea592db3160ad3bb5fdae8c5b3e23e/test/testdata/container_redis_device.json#L38