coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

rawhide: 37.20220808.91.0: miniso-install-nm testiso scenario failing #1275

Closed dustymabe closed 2 years ago

dustymabe commented 2 years ago

The miniso-install-nm testiso scenario is failing in latest rawhide:

 + cosa kola testiso -SP --qemu-native-4k --qemu-multipath --output-dir tmp/kola-testiso-metal4k
 kola -p qemu-unpriv --output-dir tmp/kola-testiso-metal4k testiso -SP --qemu-native-4k --qemu-multipath
 Testing scenarios: [iso-offline-install iso-live-login iso-as-disk miniso-install miniso-install-nm]
 Ignoring verification of signature on metal image
 PASS: iso-offline-install (uefi + metal4k + multipath) (1m6.374s)
 PASS: iso-live-login (uefi + metal4k + multipath) (19.864s)
 PASS: iso-as-disk (uefi + metal4k + multipath) (16.854s)
 PASS: miniso-install (uefi + metal4k + multipath) (1m19.593s)
 FAIL: miniso-install-nm (uefi + metal4k + multipath + nm-keyfile) (10m0.002s)
     timed out after 10m0s
 Error: timed out after 10m0s

I don't see anything immediately obvious in the logs (but I could be missing something):

The diff here looks something like:

 Upgraded:
   WALinuxAgent-udev 2.7.1.0-2.fc37 -> 2.7.3.0-1.fc37
   avahi-libs 0.8-16.fc37 -> 0.8-17.fc37
   bind-libs 32:9.16.30-1.fc37 -> 32:9.18.5-1.fc37
   bind-license 32:9.16.30-1.fc37 -> 32:9.18.5-1.fc37
   bind-utils 32:9.16.30-1.fc37 -> 32:9.18.5-1.fc37
   bootupd 0.2.6-5.fc37 -> 0.2.7-2.fc37
   clevis 18-9.fc37 -> 18-12.fc37
   clevis-dracut 18-9.fc37 -> 18-12.fc37
   clevis-luks 18-9.fc37 -> 18-12.fc37
   clevis-systemd 18-9.fc37 -> 18-12.fc37
   containerd 1.6.6-4.fc37 -> 1.6.6-6.fc37
   coreos-installer 0.15.0-2.fc37 -> 0.15.0-4.fc37
   coreos-installer-bootinfra 0.15.0-2.fc37 -> 0.15.0-4.fc37
   coreutils 9.1-3.fc37 -> 9.1-4.fc37
   coreutils-common 9.1-3.fc37 -> 9.1-4.fc37
   crypto-policies 20220428-2.gitdfb10ea.fc37 -> 20220802-1.gita99dfd2.fc37
   cyrus-sasl-gssapi 2.1.28-5.fc37 -> 2.1.28-7.fc37
   cyrus-sasl-lib 2.1.28-5.fc37 -> 2.1.28-7.fc37
   dbus-broker 31-2.fc37 -> 32-1.fc37
   fedora-release-common 37-0.8 -> 37-0.9
   fedora-release-coreos 37-0.8 -> 37-0.9
   fedora-release-identity-coreos 37-0.8 -> 37-0.9
   glibc 2.35.9000-31.fc37 -> 2.35.9000-32.fc37
   glibc-common 2.35.9000-31.fc37 -> 2.35.9000-32.fc37
   glibc-minimal-langpack 2.35.9000-31.fc37 -> 2.35.9000-32.fc37
   gnupg2 2.3.7-2.fc37 -> 2.3.7-3.fc37
   gnutls 3.7.6-5.fc37 -> 3.7.7-1.fc37
   irqbalance 2:1.8.0-4.fc37 -> 2:1.9.0-1.fc37
   kernel 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37 -> 5.19.0-65.fc37
   kernel-core 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37 -> 5.19.0-65.fc37
   kernel-modules 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37 -> 5.19.0-65.fc37
   kexec-tools 2.0.24-4.fc37 -> 2.0.25-1.fc37
   less 590-4.fc37 -> 590-5.fc37
   libargon2 20171227-10.fc37 -> 20190702-1.fc37
   libblkid 2.38-5.fc37 -> 2.38.1-1.fc37
   libfdisk 2.38-5.fc37 -> 2.38.1-1.fc37
   libgcab1 1.4-7.fc37 -> 1.5-1.fc37
   libibverbs 39.0-3.fc37 -> 41.0-1.fc37
   libicu 69.1-7.fc37 -> 71.1-1.fc37
   libldb 2.5.1-3.fc37 -> 2.5.2-2.fc37
   libmount 2.38-5.fc37 -> 2.38.1-1.fc37
   libnvme 1.1~rc0-2.fc37 -> 1.1-1.fc37
   libsmartcols 2.38-5.fc37 -> 2.38.1-1.fc37
   libsmbclient 2:4.16.3-2.fc37 -> 2:4.16.4-1.fc37
   libtirpc 1.3.2-1.rc4.fc37 -> 1.3.2-1.rc5.fc37
   libuuid 2.38-5.fc37 -> 2.38.1-1.fc37
   libwbclient 2:4.16.3-2.fc37 -> 2:4.16.4-1.fc37
   moby-engine 20.10.17-4.fc37 -> 20.10.17-7.fc37
   nano 6.3-2.fc37 -> 6.4-1.fc37
   nano-default-editor 6.3-2.fc37 -> 6.4-1.fc37
   net-tools 2.0-0.62.20160912git.fc37 -> 2.0-0.63.20160912git.fc37
   nvme-cli 2.1~rc0-2.fc37 -> 2.1.1-1.fc37
   os-prober 1.77-10.fc37 -> 1.81-1.fc37
   podman 3:4.1.1-2.fc37 -> 4:4.2.0~rc3-2.fc37
   podman-plugins 3:4.1.1-2.fc37 -> 4:4.2.0~rc3-2.fc37
   polkit 121-2.fc37 -> 121-3.fc37
   polkit-libs 121-2.fc37 -> 121-3.fc37
   rpm 4.18.0-0.beta1.3.fc37 -> 4.18.0-0.beta1.4.fc37
   rpm-libs 4.18.0-0.beta1.3.fc37 -> 4.18.0-0.beta1.4.fc37
   rpm-plugin-selinux 4.18.0-0.beta1.3.fc37 -> 4.18.0-0.beta1.4.fc37
   samba-client-libs 2:4.16.3-2.fc37 -> 2:4.16.4-1.fc37
   samba-common 2:4.16.3-2.fc37 -> 2:4.16.4-1.fc37
   samba-common-libs 2:4.16.3-2.fc37 -> 2:4.16.4-1.fc37
   selinux-policy 37.7-2.fc37 -> 37.8-1.fc37
   selinux-policy-targeted 37.7-2.fc37 -> 37.8-1.fc37
   shadow-utils 2:4.11.1-3.fc37 -> 2:4.11.1-4.fc37
   shadow-utils-subid 2:4.11.1-3.fc37 -> 2:4.11.1-4.fc37
   skopeo 1:1.9.1-1.fc37 -> 1:1.9.2-1.fc37
   sqlite-libs 3.39.1-2.fc37 -> 3.39.2-2.fc37
   util-linux 2.38-5.fc37 -> 2.38.1-1.fc37
   util-linux-core 2.38-5.fc37 -> 2.38.1-1.fc37
   vim-data 2:9.0.049-2.fc37 -> 2:9.0.137-1.fc37
   vim-minimal 2:9.0.049-2.fc37 -> 2:9.0.137-1.fc37
   which 2.21-34.fc37 -> 2.21-35.fc37
   xfsprogs 5.18.0-2.fc37 -> 5.18.0-3.fc37
 Removed:
   duktape-2.6.0-3.fc37.x86_64
 Added:
   jemalloc-5.3.0-2.fc37.x86_64
   mozjs91-91.12.0-2.fc37.x86_64

I believe I've narrowed it down to the kernel update.

bgilbert commented 2 years ago

From the console log, it appears that coreos-test-nm-keyfile.service failed.

dustymabe commented 2 years ago

For this issue we pinned the kernel for a period of time to 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37. Later it was tested and found to be working so it was unpinned in https://github.com/coreos/fedora-coreos-config/commit/8563fe68e881e6128afe8124d2494dabc898231b.

dustymabe commented 2 years ago

This is still an issue. It's a race condition. I was able to get some debug output after doing a local build with:

$ git diff
diff --git a/image-base.yaml b/image-base.yaml
index 98c324df..d9b5f32b 100644
--- a/image-base.yaml
+++ b/image-base.yaml
@@ -9,6 +9,10 @@ size: 10
 extra-kargs:
     # Disable SMT on systems vulnerable to MDS or any similar future issue.
     - mitigations=auto,nosmt
+    - systemd.log_level=debug
+    - systemd.log_color=no
+    - systemd.log_target=console
+    - debug

 # Disable networking by default on firstboot. We can drop this once cosa stops
 # defaulting to `ip=auto rd.neednet=1` when it doesn't see this key.

It turns out this is a legitimate failure. The coreos-test-nm-keyfile.service runs three commands:

ExecStart=/usr/bin/journalctl -u nm-initrd --no-pager --grep "policy: set '%[1]s' (.*) as default .* routing and DNS"
ExecStart=/usr/bin/journalctl -u NetworkManager --no-pager --grep "policy: set '%[1]s' (.*) as default .* routing and DNS"
ExecStart=/usr/bin/grep "%[1]s" /etc/NetworkManager/system-connections/%[2]s

This service also runs After=network-online.target but for some reason in this case NetworkManager-wait-online.service runs and exits before enp0s5 appears to be there. So NetworkManager-wait-online.service runs and exits and then coreos-test-nm-keyfile.service runs and fails. All of this happens directly before some messages about enp0s5 in the log:

Received SIGCHLD from PID 2342 (nm-online).^M                                                                                                                                                                                                                                                                                                                                                                                     
Child 2342 (nm-online) died (code=exited, status=0/SUCCESS)^M                                                                                                                                                                                                                                                                                                                                                                     
NetworkManager-wait-online.service: Child 2342 belongs to NetworkManager-wait-online.service.^M                                                                                                                                                                                                                                                                                                                                   
NetworkManager-wait-online.service: Main process exited, code=exited, status=0/SUCCESS (success)^M                                                                                                                                                                                                                                                                                                                                
NetworkManager-wait-online.service: bpf-lsm: Failed to delete cgroup entry from LSM BPF map: No such file or directory^M                                                                                                                                                                                                                                                                                                          
NetworkManager-wait-online.service: Changed start -> exited^M                                                                                                                                                                                                                                                                                                                                                                     
NetworkManager-wait-online.service: Job 617 NetworkManager-wait-online.service/start finished, result=done^M
[  OK  ] Finished NetworkManager-wait-onlineâ<80>¦ice - Network Manager Wait Online.^M                                                                                                                                                                                                                                                                                                                                            
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/NetworkManager_2dwait_2donline_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=517 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                               
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/NetworkManager_2dwait_2donline_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=518 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                               
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=519 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/NetworkManager_2dwait_2donline_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=520 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                               
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/NetworkManager_2dwait_2donline_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=521 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                               
systemd-journald.service: Received EPOLLHUP on stored fd 86 (stored), closing.^M                                                                                                                                                                                                                                                                                                                                                  
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/network_2donline_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=522 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                              
network-online.target changed dead -> active^M                                                                                                                                                                                                                                                                                                                                                                                    
network-online.target: Job 616 network-online.target/start finished, result=done^M                                                                                                                                                                                                                                                                                                                                                
[  OK  ] Reached target network-online.target - Network is Online.^M                                                                                                                                                                                                                                                                                                                                                              
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/network_2donline_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=523 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                              
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=524 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
coreos-test-nm-keyfile.service: Will spawn child (service_enter_start): /usr/bin/journalctl^M                                                                                                                                                                                                                                                                                                                                     
[   27.728662] kauditd_printk_skb: 121 callbacks suppressed^M                                                                                                                                                                                                                                                                                                                                                                     
[   27.728664] audit: type=1130 audit(1661821762.270:297): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M                                                                                                                                                             
Failed to read pids.max attribute of root cgroup, ignoring: No data available^M                                                                                                                                                                                                                                                                                                                                                   
coreos-test-nm-keyfile.service: Passing 0 fds to service^M                                                                                                                                                                                                                                                                                                                                                                        
coreos-test-nm-keyfile.service: About to execute /usr/bin/journalctl -u nm-initrd --no-pager --grep "policy: set 'CoreOS DHCP' (.*) as default .* routing and DNS"^M                                                                                                                                                                                                                                                              
coreos-test-nm-keyfile.service: Forked /usr/bin/journalctl as 2438^M                                                                                                                                                                                                                                                                                                                                                              
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=525 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=526 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
coreos-test-nm-keyfile.service: Changed dead -> start^M                                                                                                                                                                                                                                                                                                                                                                           
         Starting coreos-test-nm-keyfile.serâ<80>¦O Verify NM Keyfile Propagation...^M                                                                                                                                                                                                                                                                                                                                            
iscsi.service: ConditionDirectoryNotEmpty=/var/lib/iscsi/nodes failed.^M                                                                                                                                                                                                                                                                                                                                                          
iscsi.service: Starting requested but condition failed. Not starting unit.^M                                                                                                                                                                                                                                                                                                                                                      
iscsi.service: Job 627 iscsi.service/start finished, result=done^M                                                                                                                                                                                                                                                                                                                                                                
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=527 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                        
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=528 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                        
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=529 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2dpre_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=530 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                             
remote-fs-pre.target changed dead -> active^M                                                                                                                                                                                                                                                                                                                                                                                     
remote-fs-pre.target: Job 628 remote-fs-pre.target/start finished, result=done^M                                                                                                                                                                                                                                                                                                                                                  
[  OK  ] Reached target remote-fs-pre.targetâ<80>¦eparation for Remote File Systems.^M                                                                                                                                                                                                                                                                                                                                            
coreos-test-nm-keyfile.service: Executing: /usr/bin/journalctl -u nm-initrd --no-pager --grep "policy: set 'CoreOS DHCP' (.*) as default .* routing and DNS"^M                                                                                                                                                                                                                                                                    
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2dpre_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=531 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                             
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=532 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dcryptsetup_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=533 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                           
remote-cryptsetup.target changed dead -> active^M                                                                                                                                                                                                                                                                                                                                                                                 
remote-cryptsetup.target: Job 630 remote-cryptsetup.target/start finished, result=done^M                                                                                                                                                                                                                                                                                                                                          
[  OK  ] Reached target remote-cryptsetup.target - Remote Encrypted Volumes.^M                                                                                                                                                                                                                                                                                                                                                    
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dcryptsetup_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=534 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                           
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=535 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=536 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                   
remote-fs.target changed dead -> active^M                                                                                                                                                                                                                                                                                                                                                                                         
remote-fs.target: Job 626 remote-fs.target/start finished, result=done^M                                                                                                                                                                                                                                                                                                                                                          
[  OK  ] Reached target remote-fs.target - Remote File Systems.^M                                                                                                                                                                                                                                                                                                                                                                 
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=537 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                   
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=538 reply_cookie=0 signature=uoss error-name=n/a error-message=n/a^M                                                                                                                                                                                                       
systemd-user-sessions.service: starting held back, waiting for: console-login-helper-messages-gensnippet-ssh-keys.service^M                                                                                                                                                                                                                                                                                                       
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=539 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                   
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dcryptsetup_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=540 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                           
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/remote_2dfs_2dpre_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=541 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                             
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=542 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                        
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/iscsi_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=543 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                        
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=544 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=545 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/network_2donline_2etarget interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=546 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                              
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/job/615 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=547 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                                                     
systemd-journald.service: Got notification message from PID 2183 (FDSTORE=1)^M                                                                                                                                                                                                                                                                                                                                                    
systemd-journald.service: Added fd 83 (n/a) to fd store.^M                                                                                                                                                                                                                                                                                                                                                                        
Received SIGCHLD from PID 2438 (journalctl).^M                                                                                                                                                                                                                                                                                                                                                                                    
Child 2438 (journalctl) died (code=exited, status=0/SUCCESS)^M                                                                                                                                                                                                                                                                                                                                                                    
coreos-test-nm-keyfile.service: Child 2438 belongs to coreos-test-nm-keyfile.service.^M                                                                                                                                                                                                                                                                                                                                           
coreos-test-nm-keyfile.service: Main process exited, code=exited, status=0/SUCCESS (success)^M                                                                                                                                                                                                                                                                                                                                    
coreos-test-nm-keyfile.service: Running next main command for state start.^M                                                                                                                                                                                                                                                                                                                                                      
coreos-test-nm-keyfile.service: Will spawn child (service_run_next_main): /usr/bin/journalctl^M                                                                                                                                                                                                                                                                                                                                   
coreos-test-nm-keyfile.service: Passing 0 fds to service^M                                                                                                                                                                                                                                                                                                                                                                        
coreos-test-nm-keyfile.service: About to execute /usr/bin/journalctl -u NetworkManager --no-pager --grep "policy: set 'CoreOS DHCP' (.*) as default .* routing and DNS"^M                                                                                                                                                                                                                                                         
coreos-test-nm-keyfile.service: Forked /usr/bin/journalctl as 2443^M                                                                                                                                                                                                                                                                                                                                                              
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=548 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/coreos_2dtest_2dnm_2dkeyfile_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=549 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                                 
enp0s5: Processing udev action (SEQNUM=2133, ACTION=add)^M                                                                                                                                                                                                                                                                                                                                                                        
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=550 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                      
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=551 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                      
sys-subsystem-net-devices-enp0s5.device: Changed tentative -> plugged^M                                                                                                                                                                                                                                                                                                                                                           
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2ddevices_2dpci0000_3a00_2d0000_3a00_3a05_2e0_2dvirtio4_2dnet_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=552 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                    
coreos-test-nm-keyfile.service: Executing: /usr/bin/journalctl -u NetworkManager --no-pager --grep "policy: set 'CoreOS DHCP' (.*) as default .* routing and DNS"^M                                                                                                                                                                                                                                                               
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2ddevices_2dpci0000_3a00_2d0000_3a00_3a05_2e0_2dvirtio4_2dnet_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=553 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                    
sys-devices-pci0000:00-0000:00:05.0-virtio4-net-enp0s5.device: Changed tentative -> plugged^M                                                                                                                                                                                                                                                                                                                                     
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2ddevices_2dpci0000_3a00_2d0000_3a00_3a05_2e0_2dvirtio4_2dnet_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=554 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                    
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2ddevices_2dpci0000_3a00_2d0000_3a00_3a05_2e0_2dvirtio4_2dnet_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=555 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                    
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=556 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                      
Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/sys_2dsubsystem_2dnet_2ddevices_2denp0s5_2edevice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=557 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a^M                                                                                                                                      
systemd-journald.service: Got notification message from PID 2183 (FDSTORE=1)^M                                                                                                                                                                                                                                                                                                                                                    
systemd-journald.service: Added fd 86 (n/a) to fd store.^M                                                                                                                                                                                                                                                                                                                                                                        
Received SIGCHLD from PID 2443 (journalctl).^M                                                                                                                                                                                                                                                                                                                                                                                    
Child 2443 (journalctl) died (code=exited, status=1/FAILURE)^M                                                                                                                                                                                                                                                                                                                                                                    
coreos-test-nm-keyfile.service: Child 2443 belongs to coreos-test-nm-keyfile.service.^M                                                                                                                                                                                                                                                                                                                                           
coreos-test-nm-keyfile.service: Main process exited, code=exited, status=1/FAILURE^M                                                                                                                                                                                                                                                                                                                                              
coreos-test-nm-keyfile.service: Failed with result 'exit-code'.^M                                                                                                                                                                                                                                                                                                                                                                 
coreos-test-nm-keyfile.service: Service will not restart (restart setting)^M                                                                                                                                                                                                                                                                                                                                                      
coreos-test-nm-keyfile.service: Changed start -> failed

You can see that coreos-test-nm-keyfile.service is executing journalctl -u NetworkManager right before enp0s5: Processing udev action (SEQNUM=2133, ACTION=add).

Full log: console.txt

thom311 commented 2 years ago

btw, the NetworkManager-wait-online manual ([1] explains how this is supposed to work).

In the attached log, you always see startup complete line after device (enp0s5): state change: secondaries -> activated.

The logs are confusing to me, but I would not assume that NetworkManager-wait-online.service completes before startup complete line. Do you think that is the problem?

Otherwise, the assumption would be that NM thinks enp0s5 is fully activated, when in fact it is not.

runs and exits before enp0s5 appears to be there.

I don't see that. Why do you get to that conclusion?

dustymabe commented 2 years ago

btw, the NetworkManager-wait-online manual ([1] explains how this is supposed to work).

Thanks for the link.

In the attached log, you always see startup complete line after device (enp0s5): state change: secondaries -> activated.

The logs are confusing to me, but I would not assume that NetworkManager-wait-online.service completes before startup complete line. Do you think that is the problem?

I think maybe it's confusing because this test tests a full install, so there are actually two boots in there: the install boot and the Ignition boot. Can you look at just the second boot?

Otherwise, the assumption would be that NM thinks enp0s5 is fully activated, when in fact it is not.

Or it's not seeing it at all because it's not activated yet. I need to investigate further but the page you linked to mentions:

The property connection.wait-device-timeout of the connection profiles waits until
the waited devices appear. This is useful if the driver takes a longer time to
detect the networking interfaces. Similar with the connection.gateway-ping-timeout property.

runs and exits before enp0s5 appears to be there.

I don't see that. Why do you get to that conclusion?

See the logs I pasted in my previous comment where you see [ OK ] Finished NetworkManager-wait-online before network-online.target: Job 616 network-online.target/start finished before enp0s5: Processing udev action (SEQNUM=2133, ACTION=add).

dustymabe commented 2 years ago

ok using wait-device-timeout seems to have helped. Before we had:

[connection]
id=<id>
type=ethernet

[ipv4]
method=auto

Now with this it's working better:

[connection]
id=<id>
type=ethernet
wait-device-timeout=20000

[ipv4]
method=auto

I guess everything appears to be working as expected for NetworkManager-wait-online.service. We did see a change in behavior in Rawhide/F37 with respect to this but I'm not sure whose fault it is (it's a race condition after all).

What I dont understand is why NetworkManager-wait-online.service would exit early (again this is by design) if there aren't any connections that have been established. I guess I can take that discussion to the NM issue tracker or to IRC.

thom311 commented 2 years ago

See the logs I pasted in my previous comment where you see [ OK ] Finished NetworkManager-wait-online before network-online.target: Job 616 network-online.target/start finished before enp0s5: Processing udev action (SEQNUM=2133, ACTION=add).

Looking at the second boot, in the console log from there, we see also

enp0s5: Processing udev action (SEQNUM=1768, ACTION=add)

in line 22928, before NetworkManager is starting. Also, NetworkManager logs about seeing that interface, for example in line 26958:

[    7.517981] NetworkManager[1383]: <info>  [1661821741.6415] manager: (enp0s5): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)

meaning, the device was there, and according to the log, NetworkManager activated the device before startup complete was reached.

I would not think that wait-device-timeout makes a difference in that example.

It can be right to set the property. For example, nm-initrd-generator likes to set wait-device-timeout, it's just not clear to me that this was the problem/solution in the provided log.


All of the above happens in initrd (of the second boot). Later, after switch root we see the mentioned

enp0s5: Processing udev action (SEQNUM=2133, ACTION=add)

in line 43061. I guess, you are talking about real-root of the second boot? There we don't see any logs from NetworkManager, so it's hard to tell what happens. Surely, the device did exist, because it already existed in initrd, and wait-device-timeout would again not matter.

dustymabe commented 2 years ago

I guess, you are talking about real-root of the second boot? There we don't see any logs from NetworkManager, so it's hard to tell what happens. Surely, the device did exist, because it already existed in initrd, and wait-device-timeout would again not matter.

Correct. We are talking about the real-root of the second boot. In Fedora CoreOS we tear down any networking when we switch from the initrd to the real root. So Networking gets brought back up again in the real root when NM starts again.

Do you know why NetworkManager-wait-online.service would behave differently there?

Note that the fix that we implemented set wait-device-timeout and now our tests are passing. That indicates to me that it does matter.