Open chavafg opened 6 years ago
This is mainly why we have the Openshift test failing on our jenkins CI.
Openshift seems to work well, but when checking if there is a CC launched with Openshift using cc-runtime list
, we got this failure.
Hi @chavafg - this is pretty weird. list
calls stat
in the default scenario on the rootfs provided by the config.json
config file. And the error you are seeing shows that directory doesn't exist. So either the directory hasn't yet been created by the time list
is called, or else it's already been deleted. It seems totally wrong if the runtime is provided with a config with an invalid rootfs so I'm assuming it's the latter scenario even though that too seems odd.
Could you get any further insight into which scenario we might be hitting if you add some debug to hello_world.bats
?
From what I can see is that we are seeing the scenario where the directory has already been deleted (from the swarm execution) because if Openshift is executed alone (no swarm tests executed prior Openshift), this error does not appear.
Thanks @chavafg. It does sound like your error handling for list
is a little aggressive. I've raised #906 to mellow it a bit! Running the CI on the test PR I'm about to raise should give us a useful datapoint...
I also noticed that we're not doing exactly what runc
is doing wrt user handling for list
so also raised #905.
list
isn't able to query all details.@chavafg can you confirm this happend with all distros or just in fedora. I tried to reproduce it in ubuntu 16.04 and cc-runtime list is still working and not qemu instances are left.
@jcvenegas I confirm that this only happens when running with Fedora
@chavafg , Great that narrows the search space. So by looking at
time="2018-01-08T15:49:53Z" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-08T15:49:53Z" level=error msg="Container 699f5cc51b188a422cb385d0264f736cc2c34d36572631f97a9ee0b744804fb8 not ready or running, cannot send a signal" source=runtime
time="2018-01-08T15:49:56Z" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-08T15:49:56Z" level=error msg="Container 49c70e05bfc112d29198472ef1d0ff4d5cc11257015e74e2dd86c47e82ed3aa8 not ready or running, cannot send a signal" source=runtime
time="2018-01-08T15:49:58Z" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-08T15:49:58Z" level=error msg="Container 4a6cdd022a982a7e07bf61cc199f2fff37a31ba639918671fc4b64739dd4f88f not ready or running, cannot send a signal" source=runtime
time="2018-01-08T15:49:59Z" level=error msg="Container not running, impossible to signal the container" source=runtime
time="2018-01-08T15:49:59Z" level=error msg="Container dbefcd9c91e90e2b16bfa1238ea6a427ddfcc2440d2121ff9874bfb7c871d6d9 not ready or running, cannot send a signal" source=runtime
time="2018-01-08T15:50:07Z" level=error msg="stat /var/lib/docker/overlay2/892032c0d059fbb32ba07497cfe5c74d6a16506955400d8532e3c150f8f80723/merged: no such file or directory" source=runtime
We have some issues trying to stop a container (and probably delete) , lets try to get a complete runtime log from a swarm testsuite to know what is going on.
Hi @chavafg - is this still a problem?
Description of problem
After swarm tests are executed in the CI, the
cc-runtime list
command does not work and there areqemu
,cc-proxy
andcc-shim
processes left behind.Expected result
cc-runtime list
should work no processes should left behind.Actual result
Swarm tests executed:
cc-runtime list failure:
docker ps -a
:processes left:
[fuentess@ci-failure tests]$ sudo -E PATH=$PATH cc-collect-data.sh
Meta details
Running
cc-collect-data.sh
version3.0.12 (commit be7d1cebb5e8afbb2a9329264d4163adc70b0362)
at2018-01-08.16:00:02.709520618+0000
.Runtime is
/usr/local/bin/cc-runtime
.cc-env
Output of "
/usr/local/bin/cc-runtime cc-env
":Runtime config files
Runtime default config files
Runtime config file contents
Config file
/etc/clear-containers/configuration.toml
not found Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"
":Logfiles
Runtime logs
Recent runtime problems found in system journal:
Proxy logs
No recent proxy problems found in system journal.
Shim logs
Recent shim problems found in system journal:
Container manager details
Have
docker
Docker
Output of "
docker version
":Output of "
docker info
":Output of "
systemctl show docker
":No
kubectl
Packages
No
dpkg
Haverpm
Output of "rpm -qa|egrep "(cc-proxy|cc-runtime|cc-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86|cc-oci-runtime)"
":