clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

crio test failing with image 20120 (agent d9790) #896

Open jcvenegas opened 6 years ago

jcvenegas commented 6 years ago

Image 20120 added new agent version The agent changed from 4d844b2 to d9790 https://download.clearlinux.org/releases/20120/clear/RELEASENOTES

Here is the list of changes merged: ad7c8a5 privileges: Add NoNewPrivileges in the api 4c2b752 devices: Bind mount devices only when /dev is expected to be bindmounted d38fb19 dev: Listen to udev events and bind mount these to container's /dev bd925f1 agent: Make the agent the subreaper of all processes cf05dda vendor: Update libcontainer vendoring 4dac904 dev: Mount /dev from tmpfs before mount namespace is created

Using this new agent is making fail cri-o tests

not ok 13 ctr device add It is failing trying to check for a device in /dev/mynull https://github.com/clearcontainers/tests/blob/7cbbda93b7f8399b6e76fdd541d664ddb45ea3bd/integration/cri-o/crio.bats#L530

The file is created based on this config file: https://github.com/kubernetes-incubator/cri-o/blob/6b91df3da7ea592db3160ad3bb5fdae8c5b3e23e/test/testdata/container_redis_device.json#L38

jcvenegas commented 6 years ago

I tested the image locally using this tests and the tests is passing. Btw seems that the tests is wrong with the new agent changes and older we never pass the device but the exec commend from cri pass because not return an exit code different to zero then try to match for /dev/mynull. The cri command match but actually the fails.

Stdout:

Stderr:
ls: /dev/mynull: No such file or directory

Exit code: 1

Note: It fails for both for any agent/image because we never send that device to be created in the container.

jcvenegas commented 6 years ago
ok 12 ctr execsync
not ok 13 ctr device add
# (in test file crio.bats, line 530)
#   `[[ "$output" =~ "/dev/mynull" ]]' failed
# 0
# time="2018-01-04 21:24:12.206637296Z" level=debug msg="[graphdriver] trying provided driver "devicemapper"" 
# time="2018-01-04 21:24:12.207340593Z" level=debug msg="devicemapper: kernel dm driver version is 4.35.0" 
# time="2018-01-04 21:24:12.207451992Z" level=debug msg="devmapper: Generated prefix: container-8:1-1032959" 
# time="2018-01-04 21:24:12.207471692Z" level=debug msg="devmapper: Checking for existence of the pool container-8:1-1032959-pool" 
# time="2018-01-04 21:24:12.207642791Z" level=debug msg="devmapper: Pool doesn't exist. Creating it." 
# time="2018-01-04 21:24:12.296651734Z" level=debug msg="devmapper: loadDeviceFilesOnStart()" 
# time="2018-01-04 21:24:12.296739434Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/0492ba3dc6085a828f8fb8caa81aa63d228652512cd31db515447f94153cf9a2" 
# time="2018-01-04 21:24:12.296863233Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/0ef4f15a6e70fde8a60bd744ba793ca475e5c0db8f9e4c40ab61b4fd7d19a03e" 
# time="2018-01-04 21:24:12.296891233Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/1fac3116efcc18304d292ceeb01d14a8be87f5ed62b6fe807ad3674edca66212" 
# time="2018-01-04 21:24:12.296913133Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/25a11eda63ca7bf7a51fd215b93b64ef1aade849d20a8ae5817c8d2cb092b3d7" 
# time="2018-01-04 21:24:12.296934133Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/5471cf527711c5f3206bdfdd8de43336917a1202786bfed22cad6fc834e021c3" 
# time="2018-01-04 21:24:12.296954132Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/618033da5ea1cb3ee1e52365f9ae4bf5319f464744c7706cf8d243974abd82c1" 
# time="2018-01-04 21:24:12.296975432Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/72810732caf19844a8658b3bc2497023c9be857bdd880672db3b8f03b51d1018" 
# time="2018-01-04 21:24:12.296995432Z" level=debug msg="devmapper: Loading data for file /tmp/tmp.qZxLyuTFvZ/crio/devicemapper/metadata/74f49e9c31bbd1145c97a0e4ed420f137b5434e33a24ceb32a6a30c98a280754" 
jcvenegas commented 6 years ago

A new PR was send to agent repository https://github.com/clearcontainers/agent/pull/188 But seems that the crio tests are failing randomly

not ok 11 ctr execsync conflicting with conmon flags parsing
# (in test file crio.bats, line 476)
#   `[ "$status" -eq 0 ]' failed
# 0
# time="2018-01-05 01:23:30.945101006Z" level=debug msg="backingFs=extfs,  projectQuotaSupported=false" 
# time="2018-01-05 01:23:30.945152706Z" level=info msg="[graphdriver] using prior storage driver: overlay" 
# time="2018-01-05 01:23:30.946779979Z" level=info msg="CNI network crionet (type=bridge) is used from /tmp/tmp.PEhMeoU8La/cni/net.d/10-crio.conf" 
# time="2018-01-05 01:23:30.947451969Z" level=info msg="CNI network crionet (type=bridge) is used from /tmp/tmp.PEhMeoU8La/cni/net.d/10-crio.conf" 
# time="2018-01-05 01:23:30.956324627Z" level=debug msg="seccomp status: true" 
# time="2018-01-05 01:23:30.956828219Z" level=debug msg="Golang's threads limit set to 115020" 
# time="2018-01-05 01:23:30.957669905Z" level=debug msg="sandboxes: []" 
sboeuf commented 6 years ago

I have tried to reproduce (many attempts) but both with image 19790 and 20120, I don't have any CRI-O issue. Maybe because I am trying inside a VM running on Azure ?

About the PR clearcontainers/agent#188, it's actually an expected failure since we don't fallback to a minimal set of capabilities in case we don't provide those caps from the runtime. And we don't have the ability to provide them yet. That's why I have asked @amshinde to handle the case where nothing is provided to apply a minimal set of caps, otherwise this PR will break Clear Containers.

jcvenegas commented 6 years ago

@sboeuf thanks for clarify about clearcontainers/agent#188.

I also tested with image 20110 but not issues. I am testing locally in a cclouvm instance running ubuntu 16.04 but I can not reproduce it, actually the only place the issue is "reproducible" is the that changes the image https://github.com/clearcontainers/runtime/pull/895

sboeuf commented 6 years ago

@jcvenegas could you investigate by enabling devicemapper. It might be a way to reproduce more easily.

jcvenegas commented 6 years ago

@sboeuf yes, I will try it

jcvenegas commented 6 years ago

No, locally I can not reproduce it devicemapper

amshinde commented 6 years ago

@jcvenegas Going to take a look at this. Have you been able to reproduce this locally at all?

amshinde commented 6 years ago

I have been trying to reproduce this issue, but havent been able to do so. My intitial investigation:

This test passes a device "/dev/null" to the container, specifying the path inside the container for the device as "/dev/mynull". The test then checks for the exit code of running "ls /dev/mynull" using crioctl ctr execsync and if "/dev/mynull" exists in the output string.

There are 2 issues here. First this test should have never passed in the first place. The "/dev/null" is a character device, something that we dont handle with Clear Containers(only block and vfio devices are handled). This means that that the exit code from crioctl ctr execsync is not correct. I am not sure if crioctl does not propogate the error correctly, since a quick test with docker using the above device passing scenarion gives a non-zero exit code.

If we ignore the above, looks like the test is failing while matching the string "/dev/mynull" in the output. We should always get "/dev/mynull': No such file or directory" in the output, but this not received occassionally going by the logs above. The shim may have received the exit code without receiving the above message on the tty channel. @sboeuf I wonder if your changes for subreaping processes caused any side effects. Will look further.