Closed Letterus closed 2 months ago
Just noting things down for future reference:
I currently don't and never have observed any of the described behavior on gala/wingpanel/dock main so maybe a hardware issue?
However during development I've seen similar symptoms arise when misusing the pantheon wayland protocol from the client side.
@leolost2605 Thanks for commenting. Think I experienced the same with X11 and the X250 at least. Don't know if there's anything specific about the Lenovo X series architecture.
Do you think any relation to #2024 is possible, some kind of overflow?
Further note: I am/was running the Nextcloud desktop client on both machines. Currently it seems the freezes don't occur when I quit the client, but occour more often when I edit and save or move a local file synced via Nextcloud. Does this sound reasonable to you?
Edit: After checking this twice this really seems related to the Nextcloud desktop client. But don't know where to start debugging yet.
I'm not experiencing this issue anymore since the last updates. I don't know if this is by coincidence. But I'm closing it for now and going to reopen it in case it occurs again.
Reopening this as occasional hangs keep occuring. I think it's not only the Nextcloud client but tasks with heavy IO that lead to the screen becoming stuck for some seconds. I don't know the code, but it seems to me there is some piece connected with IO that's not working async.
Hmm that could very well be. I think KDE had a similar problem about doing heavy caching. Are you running an HDD by chance?
Nope, only SSD.
Is there a good way of debugging? Which place could I start digging into the code and maybe set some debug messages?
Found the following log messages in /var/log/syslog
close to the last hang:
2024-08-28T11:25:17.419787+02:00 XinkPad280 geoclue[1485]: Failed to query location: Query location SOUP error: Not Found
2024-08-28T11:26:21.179612+02:00 XinkPad280 kernel: workqueue: delayed_fput hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
2024-08-28T11:27:56.455675+02:00 XinkPad280 geoclue[1485]: message repeated 4 times: [ Failed to query location: Query location SOUP error: Not Found]
2024-08-28T11:28:10.571720+02:00 XinkPad280 zeitgeist-datah[2119]: zeitgeist-datahub.vala:210: Error during inserting events: GDBus.Error:org.gnome.zeitgeist.EngineError.InvalidArgument: Incomplete event: interpretation, manifestation and actor are required
Don't know if any of these may cause the issue? geoclue or zeitgeist-datahub?
During next hang appeared again:
2024-08-28T11:50:27.210917+02:00 XinkPad280 zeitgeist-datah[2141]: zeitgeist-datahub.vala:210: Error during inserting events: GDBus.Error:org.gnome.zeitgeist.EngineError.InvalidArgument: Incomplete event: interpretation, manifestation and actor are required
Edit: It further freezes. Even without zeitgeist-datahub running.
I freshly installed and just started GNOME Contacts, which had to load quite some addressbooks and lots of contacts of mine - and the whole screen froze again for quite some seconds. It seems to be related to IO, but it may be some synchronous waits as well as Zeitgeist as Evolution Data Server.
I made it freeze again by using Starfish app and opening a domain (that was somehow hanging and using lots of CPU cycles which lead to the "app is not answering do you want to kill it?" dialogue).
Log:
2024-08-30T09:21:18.814890+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:43.049638+02:00 XinkPad280 xdg-desktop-por[2351]: g_application_get_resource_base_path: assertion 'G_IS_APPLICATION (application)' failed
2024-08-30T09:21:43.190290+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:43.199444+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:43.199673+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:43.199761+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:43.199874+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:43.303313+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:21:50.461054+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:50.470182+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:50.470628+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:50.470817+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:50.471103+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:50.514123+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:21:57.936407+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:57.955717+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:57.956572+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:57.956833+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:57.957192+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:57.992184+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:22:15.391481+02:00 XinkPad280 systemd[1572]: app-flatpak-hr.from.josipantolis.starfish-23144.scope: Consumed 11.872s CPU time.
Two further freezes occured while doing some accidental stuff like scrolling through mails and opening the browser. In the syslog I only found entries about rtkit-daemon which I now disabled to see if it is causing the issues. If it is may that prove the theory that the freezing is related to synchronously handled DBus calls/events?
Still observing freezes. Maybe they are shorter now and there no log messages at those times anymore…
Coming back to @leolost2605's first proposal: Hardware/driver issue.
From time to time dmesg
tells:
[ 1094.058291] workqueue: delayed_fput hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
[ 1417.338303] workqueue: delayed_fput hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND
Symptoms look like the i915 driver hanging issue documented here: https://bbs.archlinux.org/viewtopic.php?id=246841&p=2
Edit: Currently I'm trying to add i915.enable_psr=0
to the kernel parameters, but I had to do it manually during boot time. Changing it in /etc/default/grub
and executing update-grub2
and update-initramfs -u -k all
had no effect. I don't know yet why.
Edit 2, note: Check the effect with cat /proc/cmdline
and sudo cat /sys/module/i915/parameters/enable_psr
.
The latter is interesting: Neither land kernel options as boot parameters in grub nor does the updated kernel (-41) happen to be booted. It's still the old one (-40). What's going on there? /boot/grub/grub.cfg
looks updated and correct though.
Working with the machine having the i915.enable_psr=0
kernel boot parameter enabled for some time now - no freezes at all up to now. So it looks as this really is the driver issue mentioned above (that does not take place with every DE apparently).
I now need to figure out how to make the fix permanent as configuring grub does not work as pointed out above? Maybe that is a separate issue for another repo?
Permanent fix works by creating the file /etc/modprobe.d/i915.conf
and entering:
options i915 enable_psr=0
Afterwards make sure to execute:
sudo update-initramfs -u -k all
Then reboot.
Check the effect with
sudo cat /sys/module/i915/parameters/enable_psr
Still don't know why grub parameters don't work and why it would load the older kernel.
The resolution to graphic hangs is described above.
Boot issues are resolved by the last updates as documented in https://github.com/elementary/switchboard-plug-about/issues/335.
Closing this issue as resolved.
What Happened?
Using the current OS 8 preview/daily, Wayland session, ThinkPad X280, Intel graphics, I observe a complete freeze of the whole screen (including the mouse pointer) for seconds. During this time it just hangs and would not change output, but seems to be processing the last action in the background and - after the freeze - present the result (e.g. scrolling down, opening a new window). This observation is not bound to a certain behaviour/triggering action and appears from time to time. It seems to me I experienced the same behaviour on my X250 using X11 session with OS 6.1/7.1 as well.
Any hints on how to debug this? I did not find any helpful messages in /var/log/syslog for the time this issue arose.
Steps to Reproduce
Just use the Pantheon desktop. Behaviour does not seem to relate to any specific action.
I am using the Wayland session, I have enabled fractional scaling and set it to 125%
Expected Behavior
The UI/screen should not freeze.
OS Version
8.x (Early Access)
Software Version
Latest release (I have run all updates)
Log Output
No response
Hardware Info
ThinkPad X280, Intel graphics.