cage-kiosk / cage

A Wayland kiosk
https://www.hjdskes.nl/projects/cage
MIT License
1.22k stars 78 forks source link

Cage doesn't exit when all clients are gone #146

Open jbeich opened 4 years ago

jbeich commented 4 years ago

Regressed by #132 (5d7ff9e64dc7). Cage works nicely as a nested compositor for running multiple applications with minimalistic UI in fullscreen mode or Wayland-only applications on X11. Unfortunately, Cage no longer exits on its own and ignores SIGTERM and SIGINT.

$ cage -d firefox --kiosk --profile $(mktemp -dt ffprofile) https://www.youtube.com/embed/hVvEISFw9w0
<press Ctrl+Q to quit Firefox>
[types/seat/wlr_seat_pointer.c:364] button_count=1 grab_serial=23 serial=25
[types/seat/wlr_seat_pointer.c:364] button_count=0 grab_serial=25 serial=26
[types/wlr_idle.c:186] Enabling idle timers for all seats
[wayland] failed to read client connection (pid 18480)
[wayland] failed to read client connection (pid 18418)
[wayland] failed to read client connection (pid 18638)
[wayland] failed to read client connection (pid 18492)
load: 9.21  cmd: cage 18186 [select] 22.46r 0.17u 0.07s 0% 77748k
^C^C
[xwayland/xwm.c:844] XCB_DESTROY_NOTIFY (4194305)
load: 9.12  cmd: Xwayland 18200 [uwait] 24.94r 0.03u 0.00s 0% 35164k
^C^C
<press Super+Shift+q or similar to kill Cage via parent compositor>
[../cage.c:139] Child exited normally with exit status 0
$ cage -d firefox --kiosk --profile $(mktemp -dt ffprofile) https://www.youtube.com/embed/hVvEISFw9w0
^C
[wayland] failed to read client connection (pid 91482)
[xwayland/xwm.c:844] XCB_DESTROY_NOTIFY (4194305)
load: 0.45  cmd: firefox 91482 [zombie] 12.18r 4.89u 1.10s 7% 0k
^C
<press Super+Shift+q or similar to kill Cage via parent compositor>
[../cage.c:141] Child was terminated by a signal (2)
(lldb) bt
* thread #1, name = 'cage'
  * frame #0: 0x0000000800632dfa libc.so.7`__sys_poll at _poll.S:4
    frame #1: 0x0000000800902946 libthr.so.3`__thr_poll(fds=<unavailable>, nfds=<unavailable>, timeout=<unavailable>) at thr_syscalls.c:338:8
    frame #2: 0x00000008008df1f2 libepoll-shim.so.0`epollfd_ctx_wait_or_block(epollfd=0x0000000801448028, ev=0x00007fffffffdc40, cnt=32, actual_cnt=0x00007fffffffdbf4, to=-1) at epoll.c:211:7
    frame #3: 0x00000008008dee91 libepoll-shim.so.0`epoll_wait(fd=3, ev=0x00007fffffffdc40, cnt=32, to=-1) at epoll.c:230:12
    frame #4: 0x0000000800265be9 libwayland-server.so.0`wl_event_loop_dispatch(loop=0x0000000801439320, timeout=-1) at event-loop.c:1004:10
    frame #5: 0x00000008002618df libwayland-server.so.0`wl_display_run(display=0x0000000801459000) at wayland-server.c:1401:3
    frame #6: 0x0000000000208d25 cage`main(argc=7, argv=0x00007fffffffe038) at cage.c:486:2
    frame #7: 0x0000000000207e2f cage`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
(lldb) f 6
frame #6: 0x0000000000208d25 cage`main(argc=7, argv=0x00007fffffffdff0) at cage.c:486:2
   483          struct wlr_box *layout_box = wlr_output_layout_get_box(server.output_layout, NULL);
   484          wlr_cursor_warp(server.seat->cursor, NULL, layout_box->width / 2, layout_box->height / 2);
   485
-> 486          wl_display_run(server.wl_display);
   487
   488  #if CAGE_HAS_XWAYLAND
   489          wlr_xwayland_destroy(xwayland);
(lldb) f 5
frame #5: 0x00000008002618df libwayland-server.so.0`wl_display_run(display=0x0000000801459000) at wayland-server.c:1401:3
   1398
   1399         while (display->run) {
   1400                 wl_display_flush_clients(display);
-> 1401                 wl_event_loop_dispatch(display->loop, -1);
   1402         }
   1403 }
   1404
(lldb) f 2
frame #2: 0x00000008008df1f2 libepoll-shim.so.0`epollfd_ctx_wait_or_block(epollfd=0x0000000801448028, ev=0x00007fffffffdc00, cnt=32, actual_cnt=0x00007fffffffdbb4, to=-1) at epoll.c:211:7
   208                  pfds[1] = epollfd->pfds[1];
   209                  (void)pthread_mutex_unlock(&epollfd->mutex);
   210
-> 211                  if (poll(pfds, 2, MAX(to, -1)) < 0) {
   212                          return errno;
   213                  }
   214          }
matthewbauer commented 4 years ago

So I think the reasoning for not terminating when all clients are gone still makes sense - just because there is no client doesn't mean the process isn't still alive. At one point (https://github.com/Hjdskes/cage/commit/95cd528a1ae41456d5856738bcc654ef85421970), I proposed checking if the process was still alive before terminating the display. This might address your use case better?

But the more serious problem is SIGTERM not being handled correctly. I saw this problem too (and it existed before 95cd528a1ae41456d5856738bcc654ef85421970), but for some reason thought it had been fixed. I wonder if this is related to https://github.com/swaywm/wlroots/issues/2012? /cc @Hjdskes

Hjdskes commented 4 years ago

This seems to be working just fine for me, with the caveat that I'm using Firefox with GDK_BACKEND=wayland. There is a noticeable delay between Firefox closing and then Cage exiting, but Cage does exit.

jbeich commented 4 years ago

@Hjdskes, I did test with Wayland backend via MOZ_ENABLE_WAYLAND=1. It was not important enough to mention because GDK_BACKEND=wayland is default on a Wayland compositor. Other Gtk3 apps don't require fiddling with envirnoment variables.

@myfreeweb, do you think libwayland dependency on libepoll-shim can cause FreeBSD to behave unlike Linux in this case?

valpackett commented 4 years ago

That's always a possibility :) How does cage auto exit now?

Hjdskes commented 4 years ago

See https://github.com/Hjdskes/cage/blob/master/cage.c#L54

valpackett commented 4 years ago

@jbeich try adding WL_EVENT_READABLE and/or WL_EVENT_WRITABLE to uint32_t mask = WL_EVENT_HANGUP | WL_EVENT_ERROR; in cage.c.

From a quick look at epoll-shim, I suspect that epoll with no .events (libwayland only converts readable/writable to epoll events) won't deliver EOF notifications because there wouldn't be EVFILT_READ or EVFILT_WRITE. If that's the case (if what I said helps), create an issue in https://github.com/jiixyj/epoll-shim

jbeich commented 4 years ago

@myfreeweb, Cage exits fine after https://github.com/jiixyj/epoll-shim/commit/7ecf58c96d44.

SIGINT and SIGTERM is still broken without #148:

$ cage -d gtk3-demo
<Cage receives SIGTERM>
<Cage crashes when pointer is moved over its window>
* thread #1, name = 'cage', stop reason = signal SIGSEGV: invalid address (fault address: 0x800000028)
    frame #0: 0x000000000020aeab cage`render_view_toplevels(view=0x00000007fffffff8, output=0x0000000804503480, damage=0x00007fffffffd688) at render.c:116:14
   113          struct render_data data = {
   114                  .damage = damage,
   115          };
-> 116          double ox = view->lx;
   117          double oy = view->ly;
   118          wlr_output_layout_output_coords(output->server->output_layout, output->wlr_output, &ox, &oy);
   119          output_surface_for_each_surface(output, view->wlr_surface, ox, oy, render_surface_iterator, &data);
(lldb) bt
* thread #1, name = 'cage', stop reason = signal SIGSEGV: invalid address (fault address: 0x800000028)
  * frame #0: 0x000000000020aeab cage`render_view_toplevels(view=0x00000007fffffff8, output=0x0000000804503480, damage=0x00007fffffffd688) at render.c:116:14
    frame #1: 0x000000000020ac22 cage`output_render(output=0x0000000804503480, damage=0x00007fffffffd688) at render.c:177:3
    frame #2: 0x000000000020a345 cage`handle_output_damage_frame(listener=0x00000008045034e0, data=0x0000000804582080) at output.c:317:2
    frame #3: 0x000000080032417d libwlroots.so.5`wlr_signal_emit_safe(signal=0x00000008045820e0, data=0x0000000804582080) at signal.c:29:3
    frame #4: 0x000000080030c150 libwlroots.so.5`output_handle_frame(listener=0x0000000804582178, data=0x00000008014c7880) at wlr_output_damage.c:51:2
    frame #5: 0x000000080032417d libwlroots.so.5`wlr_signal_emit_safe(signal=0x00000008014c7a08, data=0x00000008014c7880) at signal.c:29:3
    frame #6: 0x0000000800312070 libwlroots.so.5`wlr_output_send_frame(output=0x00000008014c7880) at wlr_output.c:586:2
    frame #7: 0x00000008002d83eb libwlroots.so.5`surface_frame_callback(data=0x00000008014c7880, cb=0x0000000804346610, time=168074061) at output.c:37:2
    frame #8: 0x00000008008f6ac4 libffi.so.6`ffi_call_unix64 at unix64.S:76
    frame #9: 0x00000008008f5e5a libffi.so.6`ffi_call(cif=0x00007fffffffd960, fn=(libwlroots.so.5`surface_frame_callback at output.c:31), rvalue=0x0000000000000000, avalue=0x00007fffffffd990) at ffi64.c:525:3
    frame #10: 0x0000000800936412 libwayland-client.so.0`wl_closure_invoke(closure=0x000000080458a0e0, flags=1, target=0x0000000804346610, opcode=0, data=0x00000008014c7880) at connection.c:1018:2
    frame #11: 0x0000000800933bac libwayland-client.so.0`dispatch_event(display=0x0000000801423700, queue=0x00000008014237d0) at wayland-client.c:1445:3
    frame #12: 0x0000000800932bbc libwayland-client.so.0`dispatch_queue(display=0x0000000801423700, queue=0x00000008014237d0) at wayland-client.c:1591:3
    frame #13: 0x0000000800932972 libwayland-client.so.0`wl_display_dispatch_queue_pending(display=0x0000000801423700, queue=0x00000008014237d0) at wayland-client.c:1833:8
    frame #14: 0x000000080093250e libwayland-client.so.0`wl_display_dispatch_queue(display=0x0000000801423700, queue=0x00000008014237d0) at wayland-client.c:1809:9
    frame #15: 0x0000000800932c32 libwayland-client.so.0`wl_display_dispatch(display=0x0000000801423700) at wayland-client.c:1876:9
    frame #16: 0x00000008002d53c5 libwlroots.so.5`dispatch_events(fd=6, mask=1, data=0x0000000801424c00) at backend.c:46:11
    frame #17: 0x00000008002645a7 libwayland-server.so.0`wl_event_source_fd_dispatch(source=0x0000000801417a00, ep=0x00007fffffffdca0) at event-loop.c:112:9
    frame #18: 0x0000000800265d48 libwayland-server.so.0`wl_event_loop_dispatch(loop=0x0000000801439320, timeout=-1) at event-loop.c:1027:4
    frame #19: 0x00000008002618df libwayland-server.so.0`wl_display_run(display=0x0000000801459000) at wayland-server.c:1401:3
    frame #20: 0x0000000000208d25 cage`main(argc=3, argv=0x00007fffffffe090) at cage.c:486:2
    frame #21: 0x0000000000207e2f cage`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76:7
Hjdskes commented 4 years ago

@jbeich I'm sorry, I don't understand the current state of this issue. Can you clarify "Cage exits fine after jiixyj/epoll-shim@7ecf58c"? Do you mean that Cage no longer blocks, but now instead it crashes on SIGINT and SIGTERM?

jbeich commented 4 years ago

@Hjdskes, correct. Killing clients inside Cage works fine as Cage exists after a few seconds when all are gone but when Cage itself is killed it doesn't exit gracefully but crashes.

jbeich commented 4 years ago

Regarding epoll-shim downstream it should be updated soon, see https://reviews.freebsd.org/D25052

n3rdopolis commented 2 years ago

I think I get something like this

if I cage -- foot -- bash and then in bash I sleep inf and then exit

Foot becomes a zombie, and then cage never quits

emersion commented 1 year ago

Is this still relevant?

n3rdopolis commented 1 year ago

Yes, that sleep command is sleep inf & I forgot the &

That gets the foot terminal to become a zombie, so the PID is still there, but it's waiting for the child process to exit, (which is just sleep )which doesn't show anything, leaving cage to just be blank.

Of course, if the child process is graphical, users might not want it to exit, so maybe an option to quit when there are no active Wayland clients instead of waiting for the PID to die might be helpful...

n3rdopolis commented 1 year ago

Of course, that suggestion could now also break any Wayland clients that disconnect and reconnect during its process lifetime for whatever reason. Not sure if there are many out there...

Winterhuman commented 10 months ago

For anyone coming across this issue, SIGHUP still works fine (e.g. For systemd services, set KillSignal=SIGHUP under [Service])