cage-kiosk / cage

A Wayland kiosk
https://www.hjdskes.nl/projects/cage
MIT License
1.24k stars 80 forks source link

Performance degradation with nesting in another compositor #186

Open jbeich opened 3 years ago

jbeich commented 3 years ago

Cage being a kiosk can be used to isolate GUI apps from a parent compositor. For this to scale performance should be close to transparent. However, Cage (unlike Sway) quickly falls below 60 FPS as the chain of compositors grows.

For example, 20 instances of Cage:

$ GALLIUM_HUD=fps,frametime cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage glxgears
30 frames in 5.0 seconds =  5.999 FPS
^C
$ VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage cage vkcube-wayland
<around 9 FPS>
^C

versus 20 instances of Sway:

$ sh sway.sh
300 frames in 5.0 seconds = 59.962 FPS # from glxgears
<around 60 FPS from vkcube>
^C

$ cat sway.sh
trap 'rm -f /tmp/sway*.conf' EXIT TERM INIT

# Create chain of nested compositors
unset i; while [ $((i+=1)) -le 20 ]; do
cat <<EOF >/tmp/sway$i.conf
default_border none
exec sway -c /tmp/sway$((i+1)).conf
EOF
done

# Run real applications at the end of the chain
cat <<EOF >/tmp/sway$i.conf
for_window [shell=".*"] title_format "%title :: %shell"
exec VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay vkcube-wayland
exec GALLIUM_HUD=fps\,frametime glxgears
EOF

# Execute the chain
sway -c /tmp/sway1.conf
Hjdskes commented 3 years ago

That is very interesting. I think it might have to do with timing or scheduling frames, something Sway has implemented that Cage is still missing...

emersion commented 3 years ago

Right, the lack of frame scheduling adds a lag of 1 frame for each nested instance. I hope a wlroots helper can help with that, and avoid implementing potentially complex logic in cage.

Another useful feature is direct scan-out to avoid any buffer copy. This should already work well for fullscreen GPU buffers, but not yet for shared-memory buffers or scenes with multiple buffers. Since cage inside cage happens to provide a fullscreen GPU buffer to the parent compositor, the cost should be minimal anyways.

Hjdskes commented 2 years ago

I just confirmed that this still happens with the new scene graph. Is there anything Cage needs to do here @emersion or does this need to be fixed in wlroots?

emersion commented 2 years ago

Hm, I missed one thing in the original post: I don't understand why it works better with Sway's default config. max_render_time isn't enabled, is it? The frame scheduling should be exactly the same in both compositors.