Closed miabbott closed 5 years ago
Notably if you kola spawn
into a qemu-unpriv
RHCOS machine both /proc/net/route
& /proc/net/ipv6_route
do exist and restarting crio
causes it to enter an active (running)
state. I'd argue that's probably a race in RHCOS where crio
is activating before the routes are populated.
Good find @arithx, thanks!
A little more context in the logs shows that routing might not be fully setup in this secnario:
May 22 17:45:10 localhost.localdomain NetworkManager[1008]: <info> [1558547110.6223] manager: startup complete
May 22 17:45:10 localhost.localdomain systemd[1]: Started Network Manager Wait Online.
May 22 17:45:10 localhost.localdomain systemd[1]: Reached target Network is Online.
May 22 17:45:10 localhost.localdomain systemd[1]: Starting Open Container Initiative Daemon...
May 22 17:45:10 localhost.localdomain systemd[1]: Starting Generate /run/issue.d/console-login-helper-messages.issue...
May 22 17:45:10 localhost.localdomain nm-dispatcher[1088]: req:4 'connectivity-change': new request (3 scripts)
May 22 17:45:10 localhost.localdomain dhclient[1119]: bound to 10.0.2.15 -- renewal in 37152 seconds.
May 22 17:45:10 localhost.localdomain sshd[1133]: Server listening on 0.0.0.0 port 22.
May 22 17:45:10 localhost.localdomain sshd[1133]: Server listening on :: port 22.
May 22 17:45:10 localhost.localdomain systemd[1]: Started OpenSSH server daemon.
May 22 17:45:10 localhost.localdomain systemd[1]: iscsi.service: Unit cannot be reloaded because it is inactive.
May 22 17:45:10 localhost.localdomain nm-dispatcher[1088]: req:4 'connectivity-change': start running ordered scripts...
May 22 17:45:10 localhost.localdomain systemd[1]: Started Generate /run/issue.d/console-login-helper-messages.issue.
May 22 17:45:10 localhost.localdomain systemd[1]: Starting Permit User Sessions...
May 22 17:45:10 localhost.localdomain systemd[1]: Started Permit User Sessions.
May 22 17:45:10 localhost.localdomain systemd[1]: Started Serial Getty on ttyS0.
May 22 17:45:10 localhost.localdomain systemd[1]: Started Getty on tty1.
May 22 17:45:10 localhost.localdomain systemd[1]: Reached target Login Prompts.
May 22 17:45:11 localhost.localdomain crio[1139]: time="2019-05-22 17:45:11.008776919Z" level=fatal msg="no default routes found in "/proc/net/route" or "/proc/net/ipv6_route""
May 22 17:45:11 localhost.localdomain systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
May 22 17:45:11 localhost.localdomain systemd[1]: crio.service: Failed with result 'exit-code'.
May 22 17:45:11 localhost.localdomain systemd[1]: Failed to start Open Container Initiative Daemon.
May 22 17:45:11 localhost.localdomain systemd[1]: Reached target Multi-User System.
May 22 17:45:11 localhost.localdomain systemd[1]: Starting Update UTMP about System Runlevel Changes...
May 22 17:45:11 localhost.localdomain systemd[1]: Started Update UTMP about System Runlevel Changes.
May 22 17:45:11 localhost.localdomain systemd[1]: Startup finished in 1.100s (kernel) + 5.512s (initrd) + 4.018s (userspace) = 10.631s.
May 22 17:45:12 localhost.localdomain NetworkManager[1008]: <info> [1558547112.4924] manager: NetworkManager state is now CONNECTED_SITE
May 22 17:45:12 localhost.localdomain NetworkManager[1008]: <info> [1558547112.4935] policy: set 'Wired connection 1' (ens4) as default for IPv6 routing and DNS
Which is unfortunate since cri-o
is waiting for network-online.target
:
$ systemctl cat cri-o
# /usr/lib/systemd/system/crio.service
[Unit]
Description=Open Container Initiative Daemon
Documentation=https://github.com/kubernetes-sigs/cri-o
After=network-online.target
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/crio
EnvironmentFile=-/etc/sysconfig/crio-metrics
EnvironmentFile=-/etc/sysconfig/crio-network
EnvironmentFile=-/etc/sysconfig/crio-storage
Environment=GOTRACEBACK=crash
ExecStart=/usr/bin/crio \
$CRIO_STORAGE_OPTIONS \
$CRIO_NETWORK_OPTIONS \
$CRIO_METRICS_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
TasksMax=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
OOMScoreAdjust=-999
TimeoutStartSec=0
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
I'm going to close this and follow up with either the cri-o
or RHCOS teams (or both).
The network-online.target
spins indefinitely pre-systemd 241 (#999).
When running
kola
against RHCOS usingqemu_unpriv
is causing all the tests to fail because thecrio.service
is failing to start:Inspecting
journal.txt
shows what looks like the problem:Not sure why
crio
wants a default route to start, but it would be great if we could fake out those entries to get past this.cc: @ashcrow