Closed damiante closed 2 years ago
Every time I boot with this issue I have another look and see if I can find some more info :)
This time I found the following snippet in /var/log/syslog
:
Nov 7 10:05:09 PorygonZ systemd[1]: Started Session c2 of user lightdm.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 3 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Successfully made thread 1699 of process 1437 owned by '1
14' RT at priority 5.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 4 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 3 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Successfully made thread 1700 of process 1437 owned by '1
14' RT at priority 5.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 4 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ kernel: [ 11.706589] show_signal: 30 callbacks suppressed
Nov 7 10:05:09 PorygonZ kernel: [ 11.706591] traps: io.elementary.g[1663] general protection fault
ip:7fbd496d1160 sp:7ffc87500c58 error:0 in libgobject-2.0.so.0.6400.6[7fbd496a3000+36000]
Nov 7 10:05:09 PorygonZ acpid: client 1629[0:0] has disconnected
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 3 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Successfully made thread 1729 of process 1437 owned by '114' RT at priority 5.
Nov 7 10:05:09 PorygonZ rtkit-daemon[1446]: Supervising 4 threads of 1 processes of 1 users.
Nov 7 10:05:09 PorygonZ systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
Nov 7 10:05:09 PorygonZ systemd[1]: lightdm.service: Failed with result 'exit-code'.
Nov 7 10:05:10 PorygonZ systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 2.
Nov 7 10:05:10 PorygonZ systemd[1]: Stopped Light Display Manager.
Nov 7 10:05:10 PorygonZ systemd[1]: Starting Detect the available GPUs and deal with any system changes...
I think the smoking gun here is:
traps: io.elementary.g[1663] general protection fault ip:7fbd496d1160 sp:7ffc87500c58 error:0 in libgobject-2.0.so.0.6400.6[7fbd496a3000+36000]
Is there a way I can use these IDs to trace exactly what the issue is? Perhaps my GPU isn't initialising before lightdm tries to start? I'd find that a little surprising as I can see the greeter screen flicker to life briefly before crashing, one imagines that if my GPU was uninitialised it wouldn't be able to render at all (I don't have onboard graphics on my CPU).
Done even more digging with help from the ArchWiki's article on LightDM in /var/log/lighdm/seat0-greeter.log
the following 2 errors are repeated over and over:
(io.elementary.greeter:1440): Handy-WARNING **: 07:38:01.609: Failed to load icon: Error opening file /home/damian/.face: No such file or directory
** (io.elementary.greeter:1440): CRITICAL **: 07:38:01.633: MainWindow.vala:358: prompt: `Password: ' (1)
I gathered that the first issue is related to not having a profile icon set; I went and set this in the system settings and also manually placed a 96x96 png file there for good measure after saving my system settings didn't appear to generate the file. I don't think this will solve the problem but I'll let you know if it does on my next boot :)
The second issue seems more insidious and is also more cryptic. Again, just looks like Greek to me as I don't know what to do with these numbers. My guess is that io.elementary.greeter:1440
is a PID, 07:38:01.633
is probably a timestamp, but what does MainWindow.vala:358
mean and why is it critical?
@damiante
** (io.elementary.greeter:1440): CRITICAL **: 07:38:01.633: MainWindow.vala:358: prompt:
Password: ' (1)` this only seems to be a log message.
https://github.com/elementary/greeter/blob/dd932147a1074c053fa21d163c18808877482702/src/MainWindow.vala#L357-L371
I think you are right here:
I think the smoking gun here is: traps: io.elementary.g[1663] general protection fault ip:7fbd496d1160 sp:7ffc87500c58 error:0 in libgobject-2.0.so.0.6400.6[7fbd496a3000+36000]
This explains how to interpret those addresses https://stackoverflow.com/questions/60275576/how-to-interpret-linux-kernel-trap-error-line
I already tried to get the function name via addr2line
but it only return ??:0
, event with libglib2.0-0-dbgsym
installed.
@danrabbit I am not experienced with debugging vala or lightdm. Do you have any hints on how to tackle this? The only option I currently see is git bisect, which might take a very long time, since I can only reproduce this bug on my productive system.
Also #568 seems to be the same issue.
Thanks for your reply @thmey , I'll dig into this tomorrow. Interestingly I also have a multiple-monitor setup, which seems to be a common thread. Perhaps there is some misbehaviour associated with multi-mon on boot? I may also try booting with just one monitor and seeing if I can reliably reproduce the issue that way.
6f18c79c780582e43039032f6926816efa82e206 seems to cause the issue.
Hmm, I don't really understand the context of this issue. If this is the cause, is it something I can fix? Seems like I'd need to PR -> Merge -> remote build -> OS update, and that's assuming I knew what the issue was and how to fix it :P
@damiante
Seems like I'd need to PR -> Merge -> remote build -> OS update, and that's assuming I knew what the issue was and how to fix it :P
Thats right, but I think the issue is related to custom gtk themes.
Any chance you installed custom themes? I just removed all custom themes and the issue seems to be gone.
I ran sudo service lightdm restart
multiple times to check.
I've never installed a custom gtk theme myself so if any of these count as "custom" then they must have come with something else unrelated that I've installed. Since I don't know what they're used for I'm not game to remove them unless I have to
damian@PorygonZ:~$ ls /usr/share/themes
Default io.elementary.stylesheet.lime
Emacs io.elementary.stylesheet.mint
io.elementary.stylesheet.banana io.elementary.stylesheet.orange
io.elementary.stylesheet.blueberry io.elementary.stylesheet.slate
io.elementary.stylesheet.bubblegum io.elementary.stylesheet.strawberry
io.elementary.stylesheet.cocoa Raleigh
io.elementary.stylesheet.grape
Incidentally I disconnected my second monitor and rebooted hot this morning and it worked when the cold boot with two monitors failed. Could be confirmation bias though :)
Those are the default themes. I might be wrong though regarding the themes causing the issue, since I both reinstalled the greeter and removed themes after git bisect.
@damiante could you confirm that the issue occurs at commit 6f18c79 and not before? You would have to compile and install the greeter yourself.
Not super confident with this but I presume after compiling I need to change the binary path somewhere in my config; doesn't seem to be in /usr/share/lightdm/lightdm.conf.d/40-io.elementary.greeter.conf
(refers to a .desktop file as I recall) or /usr/share/xsessions/pantheon.desktop
(refers to something called io.elementary.wingpanel
). If you tell me what I need to replace with the compiled binary path I'll give it a crack.
To compile and install: meson build --prefix=/usr cd build ninja sudo ninja install
After that run sudo lightdm restart to test the greeter.
782191d1ed6bc9078d2ff3704b331699eb74702f This commit should not contain the bug and this 6f18c79c780582e43039032f6926816efa82e206 should contain it.
After testing run sudo apt install --reinstall io.elementary.greeter
@thmey you are spot on! I can't believe you managed to work it out and I'm so pleased! I can restart lightdm as much as I like on that build and it works fine, but as soon as I reinstall from apt and restart it breaks. This appears to be the problem. Thank you so much!
@damiante have you checked both commits to verify my findings? Unfortunately I don't know how whats causing the issue and how to fix it.
@thmey yes I built and swapped the installs of both and 782191d1ed6bc9078d2ff3704b331699eb74702f behaves as I would expect but 6f18c79c780582e43039032f6926816efa82e206 produces the issue. I guess for now I can keep using the 782191d1ed6bc9078d2ff3704b331699eb74702f version since actually consistently loading my GUI when I boot is more important to me than whatever features were added in between :)
I'm having the same problem here, even without multiple monitors and no hybrid graphics / multiseat setup, I only have an integrated GPU.
My HW is Intel Skylake GT2 [HD Graphics 520] on a Thinkpad T460...
My experience is consistent with your descriptions, the loops seem to have started after the update of greeter to 6.0.1 came in. When waiting really long, the greeter loop eventually stops (sth like 40-100 restarts or so) and I have many gsd-sound
and gsd-a11y-settings
processes left running...
I can confirm that reverting 6f18c79 fixes the problem.
I poked around with this even though I am not too knowledgeable of much of anything here. I believe this is a use-after-free issue where logged_in_context is used in update_style(); commenting out that use causes the issue to go away for me.
6f18c79 adds a new var logged_in which is local to the construct{} function. logged_in.get_style_context() is saved as a class private, but I don't see where logged_in itself is saved. I changed logged_in to be a class private and I am no longer able to reproduce this issue. Someone who knows what they are doing should weigh in; I am happy to be wrong.
diff --git a/src/Cards/UserCard.vala b/src/Cards/UserCard.vala
index 83df22c..8cd917e 100644
--- a/src/Cards/UserCard.vala
+++ b/src/Cards/UserCard.vala
@@ -42,6 +42,7 @@ public class Greeter.UserCard : Greeter.BaseCard {
private Gtk.Stack login_stack;
private Greeter.PasswordEntry password_entry;
+ private SelectionCheck logged_in;
private unowned Gtk.StyleContext logged_in_context;
private weak Gtk.StyleContext main_grid_style_context;
private weak Gtk.StyleContext password_entry_context;
@@ -214,7 +215,7 @@ public class Greeter.UserCard : Greeter.BaseCard {
};
avatar_overlay.add (avatar);
- var logged_in = new SelectionCheck () {
+ logged_in = new SelectionCheck () {
halign = Gtk.Align.END,
valign = Gtk.Align.END
};
This issue also happens on NixOS as reported in https://github.com/NixOS/nixpkgs/issues/151609, and I can confirm the above patch fixes the issue on our side. @oddveryodd Do you mind file a pull request at your convenience?
What Happened?
When booting, some issue occurs between lightdm and greeter. I often see Greeter load with the lock screen for a brief moment, but it will crash to a black screen with a blinking cursor or my motherboard's manufacturer logo. From this screen TTY can be enabled with CTRL + ALT + F2 and restarting lightdm can solve the issue but does not consistently do so. This occurs on almost all cold boots and most reboots. After successfully launching the service everything works correctly.
Please refer to this stack overflow topic: https://elementaryos.stackexchange.com/questions/29090/how-to-make-lightdm-more-stable-on-boot There are some logs provided in the topic
Initial suspicion is that this is config-related but I don't have much to go on. Happy to provide additional logs if asked.
Steps to Reproduce
/var/log/syslog
for evidence that lightdm has failed on startupsudo service lightdm restart
; this may require several attemptsExpected Behavior
The system correctly launches lightdm + greeter on every boot
OS Version
6.x (Odin)
Software Version
Latest release (I have run all updates)
Log Output
Hardware Info
output of
lshw -short