Ferdi265 / wl-mirror

a simple Wayland output mirror client
GNU General Public License v3.0
291 stars 12 forks source link

With some drivers, grim stalls on the primary output when wl-mirror is active #43

Open zboszor opened 1 month ago

zboszor commented 1 month ago

Hi,

I observed that with device taking a screenshot with grim on the first output stalls but finishes quickly on the second output:

# lspci -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0e)
# lspci -n -s 00:02.0
00:02.0 0300: 8086:0f31 (rev 0e)

The machine in question is a Flytech POS335 with a built-in 1024x768 monitor. The external monitor is 1600x900. Outputs: eDP-1 and VGA-1.

However, on a different hardware (Flytech POS457) grim finishes quickly on both outputs (eDP-1 and DP-1) using the same software environment:

# lspci -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Elkhart Lake [UHD Graphics Gen11 16EU] (rev 01)
# lspci -n -s 00:02.0
00:02.0 0300: 8086:4555 (rev 01)

Both devices are driven by the i915 kernel driver. Kernel version is 6.9.10. Mesa is 24.0.7. wl-mirror is the latest 0.16.5.

What can cause this difference in behaviour?

Ferdi265 commented 1 month ago

Hi!

Thanks for the report!

This looks at first glance like either a bug in grim or a bug in the compositor (I assume you are using Sway?). Grim is using the wlr-screencopy protocol, and wl-mirror is either using wlr-export-dmabuf or wlr-screencopy. AFAIK there shouldn't be a problem with multiple programs using these protocols simultaneously, but I didn't test that extensively.

Can you try running wl-mirror with -b dmabuf or -b screencopy explicitly and see if it changes the behaviour? You can also add extra logging with --verbose. Grim doesn't appear to have debug logging, although it would be interesting to see where exactly it hangs.

zboszor commented 1 month ago

Indeed, I am using Sway. Version 1.9 to be exact. I will try to add logging. Thanks.

zboszor commented 1 month ago

Besides getting a lot of this below, there is nothing that tells me something.

[ERROR] [wlr] [/usr/src/debug/wlroots/0.17.4-r0/backend/drm/atomic.c:73] connector VGA-1: Atomic commit failed: Device or resource busy

Occasionally

2024-07-22 09:28:01 - [/usr/src/debug/swaybg/1.2.0-r0/main.c:582] wl_display_roundtrip failed

On this particular machine, taking a screenshot stalls with both wl-mirror -b dmabuf and wl-mirror -b screencopy.

(More testing...) But it's inconclusive. Sometimes a couple (sometimes a little more than 30) screenshots after restarting Sway succeeds just fine, quite quickly. Then it starts stalling. After killing grim with Ctrl-C, subsequent runs also stall.

swaymsg doesn't show grim as a "stuck" client.

$ swaymsg -t get_tree
#1: root "root"
  #2147483647: output "__i3"
    #2147483646: workspace "__i3_scratch"
  #3: output "eDP-1"
    #4: workspace "1"
      #8: con "SICOM Chef - Chromium" (xdg_shell, pid: 28000, app_id: "chromium-browser (/home/sicom/.config/chromium)")
  #5: output "VGA-1"
    #6: workspace "2"
      #7: con "Wayland Output Mirror for eDP-1" (xdg_shell, pid: 28032, app_id: "at.yrlf.wl_mirror")

Without screen mirroring, screenshots do work quickly and for over 100 attempts at a time. As I wrote above, usually (much) less than 30 attempts will trigger the problem.

Ferdi265 commented 1 month ago

Besides getting a lot of this below, there is nothing that tells me something.

[ERROR] [wlr] [/usr/src/debug/wlroots/0.17.4-r0/backend/drm/atomic.c:73] connector VGA-1: Atomic commit failed: Device or resource busy

Occasionally

2024-07-22 09:28:01 - [/usr/src/debug/swaybg/1.2.0-r0/main.c:582] wl_display_roundtrip failed

On this particular machine, taking a screenshot stalls with both wl-mirror -b dmabuf and wl-mirror -b screencopy.

(More testing...) But it's inconclusive. Sometimes a couple (sometimes a little more than 30) screenshots after restarting Sway succeeds just fine, quite quickly. Then it starts stalling. After killing grim with Ctrl-C, subsequent runs also stall.

swaymsg doesn't show grim as a "stuck" client.

$ swaymsg -t get_tree
#1: root "root"
  #2147483647: output "__i3"
    #2147483646: workspace "__i3_scratch"
  #3: output "eDP-1"
    #4: workspace "1"
      #8: con "SICOM Chef - Chromium" (xdg_shell, pid: 28000, app_id: "chromium-browser (/home/sicom/.config/chromium)")
  #5: output "VGA-1"
    #6: workspace "2"
      #7: con "Wayland Output Mirror for eDP-1" (xdg_shell, pid: 28032, app_id: "at.yrlf.wl_mirror")

Without screen mirroring, screenshots do work quickly and for over 100 attempts at a time. As I wrote above, usually (much) less than 30 attempts will trigger the problem.

Interesting... I will try and see if I can reproduce this on one of my machines in the next few days.

Can you try running grim with WAYLAND_DEBUG=1? This should tell us which wayland events were delivered, which requests were sent, and thus where grim is stalled.

zboszor commented 1 month ago

Here it is:

$ grim -o HDMI-A-1 -t png capture-0.png
[3583248.091]  -> wl_display@1.get_registry(new id wl_registry@2)
[3583248.198]  -> wl_display@1.sync(new id wl_callback@3)
[3583248.682] wl_display@1.delete_id(3)
[3583248.720] wl_registry@2.global(1, "wl_shm", 1)
[3583248.738]  -> wl_registry@2.bind(1, "wl_shm", 1, new id [unknown]@4)
[3583248.748] wl_registry@2.global(2, "wl_drm", 2)
[3583248.755] wl_registry@2.global(3, "zwp_linux_dmabuf_v1", 4)
[3583248.762] wl_registry@2.global(4, "wl_compositor", 6)
[3583248.779] wl_registry@2.global(5, "wl_subcompositor", 1)
[3583248.795] wl_registry@2.global(6, "wl_data_device_manager", 3)
[3583248.803] wl_registry@2.global(7, "zwlr_gamma_control_manager_v1", 1)
[3583248.809] wl_registry@2.global(8, "zxdg_output_manager_v1", 3)
[3583248.817]  -> wl_registry@2.bind(8, "zxdg_output_manager_v1", 2, new id [unknown]@5)
[3583248.824] wl_registry@2.global(9, "ext_idle_notifier_v1", 1)
[3583248.831] wl_registry@2.global(10, "zwp_idle_inhibit_manager_v1", 1)
[3583248.838] wl_registry@2.global(11, "zwlr_layer_shell_v1", 4)
[3583248.845] wl_registry@2.global(12, "xdg_wm_base", 2)
[3583248.852] wl_registry@2.global(13, "zwp_tablet_manager_v2", 1)
[3583248.859] wl_registry@2.global(14, "org_kde_kwin_server_decoration_manager", 1)
[3583248.867] wl_registry@2.global(15, "zxdg_decoration_manager_v1", 1)
[3583248.882] wl_registry@2.global(16, "zwp_relative_pointer_manager_v1", 1)
[3583248.890] wl_registry@2.global(17, "zwp_pointer_constraints_v1", 1)
[3583248.896] wl_registry@2.global(18, "wp_presentation", 1)
[3583248.903] wl_registry@2.global(19, "zwlr_output_manager_v1", 4)
[3583248.911] wl_registry@2.global(20, "zwlr_output_power_manager_v1", 1)
[3583248.917] wl_registry@2.global(21, "zwp_input_method_manager_v2", 1)
[3583248.934] wl_registry@2.global(22, "zwp_text_input_manager_v3", 1)
[3583248.943] wl_registry@2.global(23, "zwlr_foreign_toplevel_manager_v1", 3)
[3583248.958] wl_registry@2.global(24, "ext_session_lock_manager_v1", 1)
[3583248.973] wl_registry@2.global(25, "wp_drm_lease_device_v1", 1)
[3583248.981] wl_registry@2.global(26, "zwlr_export_dmabuf_manager_v1", 1)
[3583248.988] wl_registry@2.global(27, "zwlr_screencopy_manager_v1", 3)
[3583249.029]  -> wl_registry@2.bind(27, "zwlr_screencopy_manager_v1", 1, new id [unknown]@6)
[3583249.045] wl_registry@2.global(28, "zwlr_data_control_manager_v1", 2)
[3583249.101] wl_registry@2.global(29, "wp_security_context_manager_v1", 1)
[3583249.124] wl_registry@2.global(30, "wp_viewporter", 1)
[3583249.137] wl_registry@2.global(31, "wp_single_pixel_buffer_manager_v1", 1)
[3583249.155] wl_registry@2.global(32, "wp_content_type_manager_v1", 1)
[3583249.168] wl_registry@2.global(33, "wp_fractional_scale_manager_v1", 1)
[3583249.181] wl_registry@2.global(34, "zxdg_exporter_v1", 1)
[3583249.189] wl_registry@2.global(35, "zxdg_importer_v1", 1)
[3583249.218] wl_registry@2.global(36, "zxdg_exporter_v2", 1)
[3583249.240] wl_registry@2.global(37, "zxdg_importer_v2", 1)
[3583249.253] wl_registry@2.global(38, "xdg_activation_v1", 1)
[3583249.270] wl_registry@2.global(39, "wp_cursor_shape_manager_v1", 1)
[3583249.283] wl_registry@2.global(40, "zwp_virtual_keyboard_manager_v1", 1)
[3583249.295] wl_registry@2.global(41, "zwlr_virtual_pointer_manager_v1", 2)
[3583249.308] wl_registry@2.global(42, "zwlr_input_inhibit_manager_v1", 1)
[3583249.320] wl_registry@2.global(43, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[3583249.327] wl_registry@2.global(44, "zwp_pointer_gestures_v1", 3)
[3583249.359] wl_registry@2.global(45, "wl_seat", 8)
[3583249.367] wl_registry@2.global(46, "zwp_primary_selection_device_manager_v1", 1)
[3583249.383] wl_registry@2.global(47, "wl_output", 4)
[3583249.399]  -> wl_registry@2.bind(47, "wl_output", 3, new id [unknown]@7)
[3583249.413] wl_registry@2.global(48, "wl_output", 4)
[3583249.428]  -> wl_registry@2.bind(48, "wl_output", 3, new id [unknown]@8)
[3583249.441] wl_callback@3.done(35)
[3583249.474]  -> zxdg_output_manager_v1@5.get_xdg_output(new id zxdg_output_v1@3, wl_output@8)
[3583249.498]  -> zxdg_output_manager_v1@5.get_xdg_output(new id zxdg_output_v1@9, wl_output@7)
[3583249.513]  -> wl_display@1.sync(new id wl_callback@10)
[3583249.767] wl_display@1.delete_id(10)
[3583249.794] wl_output@7.geometry(0, 0, 430, 240, 0, "Acer Technologies", "V206HQL", 0)
[3583249.804] wl_output@7.mode(1, 1600, 900, 60000)
[3583249.812] wl_output@7.scale(1)
[3583249.818] wl_output@7.done()
[3583249.876] wl_output@8.geometry(0, 0, 510, 290, 0, "Dell Inc.", "DELL U2312HM", 0)
[3583249.887] wl_output@8.mode(1, 1920, 1080, 60000)
[3583249.895] wl_output@8.scale(1)
[3583249.901] wl_output@8.done()
[3583249.906] zxdg_output_v1@3.name("HDMI-A-2")
[3583249.913] zxdg_output_v1@3.description("Dell Inc. DELL U2312HM KF87Y39L100S (HDMI-A-2)")
[3583249.926] zxdg_output_v1@3.logical_position(1600, 0)
[3583249.939] zxdg_output_v1@3.logical_size(1920, 1080)
[3583249.967] zxdg_output_v1@3.done()
[3583249.975] zxdg_output_v1@9.name("HDMI-A-1")
[3583249.989] zxdg_output_v1@9.description("Acer Technologies V206HQL LY6AA01A85GL (HDMI-A-1)")
[3583249.996] zxdg_output_v1@9.logical_position(0, 0)
[3583250.018] zxdg_output_v1@9.logical_size(1600, 900)
[3583250.025] zxdg_output_v1@9.done()
[3583250.037] wl_callback@10.done(35)
[3583250.050]  -> zwlr_screencopy_manager_v1@6.capture_output(new id zwlr_screencopy_frame_v1@10, 0, wl_output@7)
[3583250.299] zwlr_screencopy_frame_v1@10.buffer(875709016, 1600, 900, 6400)
[3583250.485]  -> wl_shm@4.create_pool(new id wl_shm_pool@11, fd 5, 5760000)
[3583250.505]  -> wl_shm_pool@11.create_buffer(new id wl_buffer@12, 0, 1600, 900, 6400, 875709016)
[3583250.514]  -> wl_shm_pool@11.destroy()
[3583250.523]  -> zwlr_screencopy_frame_v1@10.copy(wl_buffer@12)

This was from a 3rd machine with two HDMI outputs, still an Intel machine. Same software environment as previously described.

# lspci -s 00:02.0 
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 500 (rev 0b)
# lspci -n -s 00:02.0 
00:02.0 0300: 8086:5a85 (rev 0b)
Ferdi265 commented 1 month ago

Thanks for the log! It looks like grim correctly calls copy() on the screencopy frame, but the compositor never finishes and doesn't send the ready() event. This looks like it's a bug in either Sway or wlroots. I'm gonna look at the code later and see if I can find something that looks like the issue.

zboszor commented 1 month ago

Thanks in advance. I am using wlroots 0.17.4 and sway 1.9.

Originally grim was built from just 1 commit over the 1.4.0 tag, i.e. https://git.sr.ht/~emersion/grim/commit/89e02e663fabc534b7e7039514f60a8c5d70070d

Build from the latest commit https://git.sr.ht/~emersion/grim/commit/7dbb0f39cd79841bd0dc07ac4a7183facf34350e grim also stalls.

zboszor commented 1 month ago

@Ferdi265 Any news?

Ferdi265 commented 1 month ago

Hi! sorry, I didn't get to looking at this in detail. I didn't find anything obvious in the wlroots codebase at that commit and I also wasn't able to reproduce the issue, but I only had time to look at it for an hour or so.

Ferdi265 commented 1 month ago

I recommend potentially opening an issue with wlroots, since the screencopy ready event is never sent.

zboszor commented 1 month ago

Thank you. FWIW, I am using Yocto for a custom tailored distro.