blue-systems / neon-pinebook-remix

Todos for Neon Pinebook Remix
3 stars 0 forks source link

mysterious delay of 2 minutes in live-config service #77

Closed bhush9 closed 6 years ago

bhush9 commented 6 years ago

When booting image 20180507-1433 there is mysterious delay of 2 minutes in live-config service which can extend upto 3+ mnutes in some cases.

live@live-pc:~$ systemd-analyze blame
    3min 32.631s live-config.service
         17.208s networkd-dispatcher.service
          9.382s NetworkManager.service
          9.093s dev-mmcblk0p1.device
          8.034s apparmor.service
          7.615s lvm2-monitor.service
          7.457s udisks2.service
          7.189s ModemManager.service

live-config service takes aroud 3 minutes which is not normal at all.. in journalctl one can see it "skips" two minutes..

May 10 12:29:33 live-pc gpasswd[716]: user live added by root to group netdev
May 10 12:31:06 live-pc gpasswd[725]: user live added by root to group scanner

Which I had earlier thought to be addition of live user in bluetooth group and tried following which seemed to work but just when I started writing this report it turns out it happens even before the bluetooth adduser call, so that theory is busted.

diff --git a/pine/etc/live/config.conf.d/neon.conf b/pine/etc/live/config.conf.d/neon.conf
index 3302596..796aaf4 100644
--- a/pine/etc/live/config.conf.d/neon.conf
+++ b/pine/etc/live/config.conf.d/neon.conf
@@ -1,3 +1,4 @@
 LIVE_HOSTNAME=neon
 LIVE_USERNAME=neon
 LIVE_USER_FULLNAME="KDE neon"
+LIVE_USER_DEFAULT_GROUPS="audio cdrom dip floppy video plugdev netdev powerdev scanner"

More investigation pending.

bhush9 commented 6 years ago

CC: @shadeslayer @llelectronics

any idea about this live-config weirdness?

star-buck commented 6 years ago

How does this compare to the xenial img?

Other than the long delayed live.config entry, is the rest about the same?

bhush9 commented 6 years ago

I think there is extra delay in bionic image after cursor is visible... but I want to sort out live-config issue first and then compare with xenial image.

bhush9 commented 6 years ago

Not solved yet, but adding my notes..

modify kernel command line: console=ttyS0,115200n8 no_console_suspend console=tty0 mac_addr= root=/dev/mmcblk0p1 rw rootwait boot=live components hostname=live-pc username=live cma=384M live-config.debug systemd.log_level=debug

modify the sytemd journald config to have storage=persistent.

Full journal at https://bshah.in/journalctl-delay

Reading journal;

May 17 10:02:34 live-pc gpasswd[712]: user live added by root to group netdev

It tells following.

May 17 10:02:55 live-pc systemd-timesyncd[497]: Resolving ntp.ubuntu.com...
May 17 10:02:55 live-pc systemd-journald[159]: Forwarding to syslog missed 3314 messages.
May 17 10:02:55 live-pc systemd[1]: systemd-timesyncd.service: Got notification message from PID 497 (STATUS=Idle.)

right after that following lines repeat for 2 minutes and eventually ntp service gives up

May 17 10:02:55 live-pc systemd-resolved[492]: Got DNS stub UDP query packet for id 57735
May 17 10:02:55 live-pc systemd-resolved[492]: Looking up RR for ntp.ubuntu.com.localdomain IN AAAA.
May 17 10:02:55 live-pc systemd-resolved[492]: Sending response packet with id 57735 on interface 1/AF_INET.
May 17 10:02:55 live-pc systemd-resolved[492]: Processing query...
May 17 10:02:55 live-pc systemd-timesyncd[497]: Failed to resolve ntp.ubuntu.com: Temporary failure in name resolution

Flash image

edit /etc/live/config.conf.d/neon-users.conf

-LIVE_USER_DEFAULT_GROUPS="audio cdrom dip floppy video plugdev netdev powerdev scanner"
+LIVE_USER_DEFAULT_GROUPS="audio cdrom dip floppy video plugdev powerdev scanner"

This can workaround the issue, maybe

Tested:

boot1 - pass boot2 - fail (different SD card) [This waits forever (about 5 minutes)] boot3 - pass [check if it got tripped over by systemd-timesyncd running forever? - no] boot4 - pass boot5 - pass

[ pass being : it boots within normal time-frame and doesn't wait forever ]

Image: https://files.kde.org/neon/images/pinebook-remix-nonfree/bionic/20180521-0817

The image above contains the workaround mentioned above but first boot out of that image seems to fail for me..

boot1 - fail

So now investigating again with debug log and journal in hand.

bhush9 commented 6 years ago

Adding information about how to get logs,

set_cmdline=setenv bootargs console=${console} console=tty0 ${optargs} mac_addr=${ethaddr} root=${root} rw rootwait boot=live components hostname=live-pc username=live cma=384M systemd.log_level=debug live-config.debug=true

CC: @adriaandegroot

adriaandegroot commented 6 years ago
davidedmundson commented 6 years ago

Notes from me:

delay seems to be in particular in /var/lib/live/0030-user-setup

This user setup script is an absolute load of Debian. The script to set a password involves randomly dd'ing a drive :/

Now I can add timestamps all over this file and bisect down till we hit the exact spot with the block, and come up with a fix from there.

star-buck commented 6 years ago

Supercool you spotted this, if turns out the culprit.

davidedmundson commented 6 years ago

They delay happens when we call

"adduser live powerdev"

in this for loop

for group in $RET; do
                    $log $chroot $ROOT adduser "$USER" $group >/dev/null 2>&1 || true
done

Others are instant, this one that doesn't exist takes exactly 90s

My lazy workaround was to replace

a few lines above. I'm sure you can do an alternative better fix.


There's a bit to finish working out. It doesn't entirely make sense as "adduser david asdfasdfasdf" does exit immediately. I have a working possible theory on why it blocks, but not proven yet.

bhush9 commented 6 years ago

They delay happens when we call

"adduser live powerdev"

I think this is false alarm, see my first comment where I had concrete data suggesting adding user to netdev causes the delay, and before that it was bluetooth user

That said I can give this try locally and see where it takes me.

davidedmundson commented 6 years ago

I had concrete data suggesting adding user to netdev causes the delay

In your first comment we have

a timestamp of when it finishes successfully adding netdev a timestamp of when it finishes successfully adding scanner

If anything it proves it's not adding netdev, but whatever happens between netdev and scanner

Which assuming it's alphabetical could still match what I said

davidedmundson commented 6 years ago

Your patch still has the user being added to the powerdev group

davidedmundson commented 6 years ago

Continued investigation on exactly why we block

strace finally gave me something useful

connect(fd =4, "/run/dbus/system_bus_socket") fstat(fd=4) recvmsg(4) returns EAGAIN ppoll(fd=4) timesout after 90s

bhush9 commented 6 years ago

FYI, my patch didn't actually made it to last image.. sorry

davidedmundson commented 6 years ago

This DBus call comes from nss-systemd which is linked to by the adduser perl script.

From a glance of the code it seems when a group doesn't exist it tries checking dynamically to get the most up to date information.

davidedmundson commented 6 years ago

So to summarise, I'm pretty sure I've found everything and it all makes sense.

There are two minor bugs that would be fine on their own but together cause an issue.

This can be fixed by exporting SYSTEMD_NSS_BYPASS_BUS=1 in the script or just not installing libnss-systemd. Bshah implied it was a Neon specific patch adding it.

On my local system either mod works.

star-buck commented 6 years ago

And this fixes the 2 minute delay?

bhush9 commented 6 years ago

Issue is locally fixed but needs to be packaged in image.. I will roll image later with fix (later today evening or tomorrow).

-- Bhushan Shah http://blog.bshah.in IRC Nick : bshah on Freenode GPG key fingerprint : 0AAC 775B B643 7A8D 9AF7 A3AC FE07 8411 7FBC E11D

bhush9 commented 6 years ago

https://files.kde.org/neon/images/pinebook-remix-nonfree/bionic/20180531-1230/ should be fixed in this image.

(Current fix is to not add user in powerdev group, will remove systemd-nss package in next image tomorrow).