CollaboraOnline / online

Collabora Online is a collaborative online office suite based on LibreOffice technology. This is also the source for the Collabora Office apps for iOS and Android.
https://collaboraonline.com
Other
1.89k stars 711 forks source link

ERR Poco Exception: File not found: /opt/cool/child-roots/kKwSDrf3cR8mcwCG//delete.me (Starting coolwsd in unprivileged lxc container fails) #5164

Closed brknkfr closed 2 years ago

brknkfr commented 2 years ago

Describe the bug I installed the most recent coolwsd (version 22.05.5.3-1) on an unprivileged lxc container (debian 11 bullseye on amd64). It seems that coolwsd isn't starting up properly. It is falling into a strange prisoner_poll loop and a connection to port 9980 is not possible (curl -k http://127.0.0.1:9980 times out). The bug is probably related to https://github.com/CollaboraOnline/online/issues/5155.

To Reproduce

  1. Install code-brand as indicated into an unprivileged container
  2. Disable SSL in config file.
  3. Start coolwsd with systemctl start coolwsd.
  4. See error. I guess the problems lies somewhere here in following log entry:
kit-03320-03318 2022-08-17 17:17:37.561993 +0200 [ kit_spare_001 ] ERR  Poco Exception: File not found: /opt/cool/child-roots/kKwSDrf3cR8mcwCG//delete.me| kit/Kit.cpp:2968
kit-03320-03318 2022-08-17 17:17:37.562037 +0200 [ kit_spare_001 ] INF  Kit process for Jail [kKwSDrf3cR8mcwCG] finished.| kit/Kit.cpp:2978
Forced Exit with code: 0
kit-03320-03318 2022-08-17 17:17:37.562065 +0200 [ kit_spare_001 ] FTL  Forced Exit with code: 0| common/Util.cpp:1096
frk-03318-03318 2022-08-17 17:17:38.555792 +0200 [ forkit ] WRN  No live Kits exist, and we are not terminating yet.| kit/ForKit.cpp:304

After that, it repeats following message:

wsd-03594-03602 2022-08-17 17:25:48.415089 +0200 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (5000000us)(timedout)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:25:48.415147 +0200 [ prisoner_poll ] TRC  #18: Starting handling poll results of prisoner_poll at index 1 (of 2): 0| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:25:48.415177 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x0| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:25:48.415203 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:48.415225 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:48.415247 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:337
wsd-03594-03602 2022-08-17 17:25:53.418257 +0200 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (5000000us)(timedout)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:25:53.418315 +0200 [ prisoner_poll ] TRC  #17: Starting handling poll results of prisoner_poll at index 0 (of 2): 0| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:25:53.418336 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x0| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:25:53.418355 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:53.418375 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:53.418393 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:337
wsd-03594-03602 2022-08-17 17:25:58.422940 +0200 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (5000000us)(timedout)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:25:58.423008 +0200 [ prisoner_poll ] TRC  #18: Starting handling poll results of prisoner_poll at index 1 (of 2): 0| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:25:58.423036 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x0| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:25:58.423060 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:58.423083 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:25:58.423100 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 2986302 size 2| net/Socket.cpp:337
wsd-03594-03602 2022-08-17 17:26:01.411916 +0200 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (2986302us)(timedout)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:26:01.411969 +0200 [ prisoner_poll ] TRC  #17: Starting handling poll results of prisoner_poll at index 0 (of 2): 0| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:26:01.411990 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x0| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:26:01.412007 +0200 [ prisoner_poll ] TRC  #18: Sending  ping.| net/WebSocketHandler.hpp:568
wsd-03594-03602 2022-08-17 17:26:01.412029 +0200 [ prisoner_poll ] TRC  WebSocketHandler::sendFrame: Writing to #18 1 bytes in addition to 0 bytes buffered.| net/WebSocketHandler.hpp:746
wsd-03594-03602 2022-08-17 17:26:01.412055 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412071 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x5| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412100 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:337
wsd-03594-03602 2022-08-17 17:26:01.412123 +0200 [ prisoner_poll ] TRC  Poll completed with 1 live polls max (5000000us)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:26:01.412147 +0200 [ prisoner_poll ] TRC  #18: Starting handling poll results of prisoner_poll at index 1 (of 2): 4| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:26:01.412170 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x4| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:26:01.412264 +0200 [ prisoner_poll ] TRC  #18: Wrote 3 bytes of 3 buffered data| net/Socket.hpp:1437
wsd-03594-03602 2022-08-17 17:26:01.412287 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412327 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412361 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:337
wsd-03594-03602 2022-08-17 17:26:01.412385 +0200 [ prisoner_poll ] TRC  Poll completed with 1 live polls max (5000000us)| net/Socket.cpp:355
wsd-03594-03602 2022-08-17 17:26:01.412419 +0200 [ prisoner_poll ] TRC  #17: Starting handling poll results of prisoner_poll at index 0 (of 2): 0| net/Socket.cpp:433
wsd-03594-03602 2022-08-17 17:26:01.412443 +0200 [ prisoner_poll ] TRC  #18: Revents: 0x1| net/Socket.hpp:1296
wsd-03594-03602 2022-08-17 17:26:01.412483 +0200 [ prisoner_poll ] TRC  #18: Read 3 bytes in addition to 0 buffered bytes| net/Socket.hpp:1140
wsd-03594-03602 2022-08-17 17:26:01.412515 +0200 [ prisoner_poll ] TRC  #18: Incoming data buffer 3 bytes, read result: 3, events: 0x1 (not closed)| net/Socket.hpp:1311
wsd-03594-03602 2022-08-17 17:26:01.412535 +0200 [ prisoner_poll ] TRC  #18: Incoming WebSocket data of 3 bytes: 8A 01 00  | ...| net/WebSocketHandler.hpp:328
wsd-03594-03602 2022-08-17 17:26:01.412571 +0200 [ prisoner_poll ] TRC  #18: Incoming WebSocket frame code 10, fin? true, mask? false, payload length: 1, residual socket data: 0 bytes.| net/WebSocketHandler.hpp:341
wsd-03594-03602 2022-08-17 17:26:01.412601 +0200 [ prisoner_poll ] TRC  #18: Pong received: 632 microseconds| net/WebSocketHandler.hpp:368
wsd-03594-03602 2022-08-17 17:26:01.412639 +0200 [ prisoner_poll ] TRC  #17: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412654 +0200 [ prisoner_poll ] TRC  #18: setupPollFds getPollEvents: 0x1| net/Socket.hpp:858
wsd-03594-03602 2022-08-17 17:26:01.412684 +0200 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:337
wsd-03594-03594 2022-08-17 17:26:03.007519 +0200 [ coolwsd ] INF  Waiting for a new child for a max of 20000ms| wsd/COOLWSD.cpp:5245

Logs This is the full log until ctrl+c, when starting coolwsd manually as user cool with /usr/bin/coolwsd --version --o:sys_template_path=/opt/cool/systemplate --o:child_root_path=/opt/cool/child-roots --o:file_server_root_path=/usr/share/coolwsd

Full Log

iasdeoupxe commented 2 years ago

Noticed the same this week on an unprivileged LXC container running Debian 10.

After digging a little bit around this looks like some kind of missing file capabilities. The current ones for /usr/bin/coolmount looks like the following:

root@myhost:~# getcap /usr/bin/coolmount
/usr/bin/coolmount = cap_sys_admin+ep

After doing the following to set the same file capabilities like already set for /usr/bin/coolforkit:

root@myhost:~# setcap cap_chown,cap_fowner,cap_sys_chroot,cap_mknod+ep /usr/bin/coolmount
root@myhost:~# service coolmount restart

the issue disappears and the connection to the 9980 port of coolwsd isn't timing out anymore (getting a HTTP 200 now) for me.

DISCLAIMER: Not sure if only one of these new file capabilities are required so use at your own risk because i don't know the impact of setting too much capabilities.

brknkfr commented 2 years ago

Seems to work again after setting the capabilities. Hopefully there will be a reaction of the collabora people to this.

cammeltech commented 2 years ago

Same Problem - Same fix. Thank you! I've been looking for a solution for many hours.

iasdeoupxe commented 2 years ago

This was posted by @timar in https://github.com/CollaboraOnline/online/issues/5155#issuecomment-1257175825:

/usr/bin/coolmount needs cap_sys_admin privilege, because this privilege is required for the mount syscall. If you remove this privilege and add cap_chown,cap_fowner,cap_sys_chroot,cap_mknod instead, then the check for mount will fail, and bind mounting will not be used. So, it fixes your problem, but not the way you expected. The real fix was pushed, and COOL/CODE 22.05.6.3 will contain it. In short: unexpectedly in some cases mount works but unmount doesn't, therefore the code has to check for success of both mount and unmount before enabling bind mounting.

5289 and #5291 seems to contain the mentioned fix