CollaboraOnline / online

Collabora Online is a collaborative online office suite based on LibreOffice technology. This is also the source for the Collabora Office apps for iOS and Android.
https://collaboraonline.com
Other
1.78k stars 681 forks source link

Version 24.04.5.2.1 broken on rootless podman #9534

Closed witchent closed 1 month ago

witchent commented 2 months ago

Describe the Bug

I am using the docker image of collabora in a rootless podman setup and the newest update doesn't let me run it anymore. Just changing the version tag in the compose file to 24.04.5.1.1 let me run it in the exact same environment, so I don't think it is to much related to https://github.com/CollaboraOnline/online/issues/2800.

Steps to Reproduce

docker-compose.yml part:

  nextcloud-collabora:
    container_name: collabora
    image: docker.io/collabora/code:24.04.5.2.1
    restart: on-failure
    cap_add:
     - MKNOD
    ports:
      - 127.0.0.1:9980:9980
    environment:
      aliasgroup1: mydomain
      username: username
      password: securepassword
      extra_params: --o:ssl.enable=false --o:ssl.termination=true

Expected Behavior

Collabora to run and serve the discovery endpoint.

Actual Behavior

Collabora runs but I cannot even curl the discovery url. While starting up I get the following errors in the log:

collabora  | kit-00024-00024 2024-07-18 08:42:51.907653 +0000 [ kit_spare_001 ] TRC  File [/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS//etc/localtime] is already up-to-date.| common/JailUtil.cpp:549
collabora  | kit-00024-00024 2024-07-18 08:42:51.907837 +0000 [ kit_spare_001 ] DBG  Initialized jail files in 4432ms| kit/Kit.cpp:3196
collabora  | chroot("/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS/") failed (EPERM: Operation not permitted)
collabora  | kit-00024-00024 2024-07-18 08:42:51.907882 +0000 [ kit_spare_001 ] INF  chroot("/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS/")| kit/Kit.cpp:3211
collabora  | kit-00024-00024 2024-07-18 08:42:51.907919 +0000 [ kit_spare_001 ] FTL  chroot("/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS/") failed (EPERM: Operation not permitted)| kit/Kit.cpp:3214
collabora  | Forced Exit with code: 70
collabora  | kit-00024-00024 2024-07-18 08:42:51.907934 +0000 [ kit_spare_001 ] FTL  Forced Exit with code: 70| common/Util.cpp:847
collabora  | frk-00019-00019 2024-07-18 08:42:51.917946 +0000 [ forkit ] WRN  No live Kits exist, and we are not terminating yet.| kit/ForKit.cpp:312
collabora  | notcoolmount: unmount failed to detach [/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS/lo]: Operation not permitted.
collabora  | notcoolmount: forced unmount of [/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS/lo] failed: Operation not permitted.
collabora  | notcoolmount: unmount failed to detach [/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS]: Operation not permitted.
collabora  | notcoolmount: forced unmount of [/opt/cool/child-roots/1-c23b8cb8/FV5Ht6mkNAy778qS] failed: Operation not permitted.
collabora  | wsd-00001-00018 2024-07-18 08:43:05.492567 +0000 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (17999971us)(timedout)| net/Socket.cpp:446

Also, periodically the following error is logged:

collabora  | wsd-00001-00001 2024-07-18 08:45:32.745781 +0000 [ coolwsd ] INF  Waiting for a new child for a max of 20000ms| wsd/COOLWSD.cpp:4392
collabora  | wsd-00001-00001 2024-07-18 08:45:52.746026 +0000 [ coolwsd ] INF  Waiting for a new child for a max of 20000ms| wsd/COOLWSD.cpp:4392

Desktop

(Please complete the following information)

caolanm commented 2 months ago

This is likely the use of mount namespaces by default in this CODE release

can I get the output of:

sysctl user.max_user_namespaces

and

sysctl kernel.unprivileged_userns_clone

in the meantime, adding --o:mount_namespaces=false

to your extra_params

should make things work as they did before

but if I could get the full set of logs from start until the first FTL that might help indicate why it didn't fallback on its own to that mode

witchent commented 2 months ago

Sure: sysctl user.max_user_namespaces: user.max_user_namespaces = 30998

sysctl kernel.unprivileged_userns_clone: kernel.unprivileged_userns_clone = 1

Your trick also worked, which is nice already. Without the extra parameter, I attached the log from start to the first FTL.

collaboralog.txt

By the way, is there a way (extra param or something) to disable the trace logs at the start/can I fix those linking errors?

caolanm commented 2 months ago
collabora  | wsd-00001-00001 2024-07-18 10:59:27.477973 +0000 [ coolwsd ] DBG  setupChildRoot status: 3| wsd/COOLWSD.cpp:2006
collabora  | wsd-00001-00001 2024-07-18 10:59:27.478014 +0000 [ coolwsd ] INF  Using Bind Mounting: true| wsd/COOLWSD.cpp:2008
collabora  | wsd-00001-00001 2024-07-18 10:59:27.478027 +0000 [ coolwsd ] INF  Using Mount Namespaces: true| wsd/COOLWSD.cpp:2015

so it start off well, but then

collabora  | frk-00019-00019 2024-07-18 10:59:34.312807 +0000 [ coolforkitns ] DBG  File contents mismatch: [/opt/cool/systemplate//etc/hosts] exists, 77 bytes, modified at 1721249450 =/= [/etc/hosts]: exists, 68 bytes, modified at 1721300365| common/FileUtil.hpp:301
collabora  | frk-00019-00019 2024-07-18 10:59:34.312842 +0000 [ coolforkitns ] INF  No write access to path [/opt/cool/systemplate]: Permission denied| common/FileUtil.cpp:305
collabora  | frk-00019-00019 2024-07-18 10:59:34.312864 +0000 [ coolforkitns ] WRN  The systemplate directory [/opt/cool/systemplate] is read-only, and at least [/opt/cool/systemplate//etc/hosts] is out-of-date. Will have to copy sysTemplate to jails. To restore optimal performance, make sure the files in [/opt/cool/systemplate/etc] are up-to-date.| common/JailUtil.cpp:557
collabora  | frk-00019-00019 2024-07-18 10:59:34.312873 +0000 [ coolforkitns ] WRN  Failed to update the dynamic files in [/opt/cool/systemplate]. Will disable bind-mounting in this run and clone systemplate into the jails, which is more resource intensive.| common/JailUtil.cpp:496

so bind mounting is disabled, in which case in 24.04.5.2.1 coolforkitns we don't enter a namespace and so we don't have privs to chroot and fail.

So that scenario would be covered by https://github.com/CollaboraOnline/online/commit/b65ded3f6b25f9d8a834ca5a698289ce34a58d65 and use a user namespace even if bindmounting failed so we would have the right privs for chroot even if we didn't create a mount namespace because of the disable of binding

caolanm commented 2 months ago
collabora  | frk-00019-00019 2024-07-18 10:59:34.312807 +0000 [ coolforkitns ] DBG  File contents mismatch: [/opt/cool/systemplate//etc/hosts] exists, 77 bytes, modified at 1721249450 =/= [/etc/hosts]: exists, 68 bytes, modified at 1721300365| common/FileUtil.hpp:301
collabora  | frk-00019-00019 2024-07-18 10:59:34.312842 +0000 [ coolforkitns ] INF  No write access to path [/opt/cool/systemplate]: Permission denied| common/FileUtil.cpp:305
collabora  | frk-00019-00019 2024-07-18 10:59:34.312864 +0000 [ coolforkitns ] WRN  The systemplate directory [/opt/cool/systemplate] is read-only, and at least [/opt/cool/systemplate//etc/hosts] is out-of-date. Will have to copy sysTemplate to jails. To restore optimal performance, make sure the files in [/opt/cool/systemplate/etc] are up-to-date.| common/JailUtil.cpp:557

looks like the problem, I wonder why if is that the /opt/cool/systemplate doesn't seem writable. For me with e.g.

` $ podman run -i -t my-docker-image /bin/bash $ whoami cool $ ls -aslt /opt/cool/systemplate /opt/cool/systemplate/etc /opt/cool/systemplate/etc/hosts 4 -rw-r--r--. 1 cool cool 298 Jul 17 11:48 /opt/cool/systemplate/etc/hosts

/opt/cool/systemplate: total 0 0 drwxr-xr-x. 1 cool cool 34 Jul 17 11:48 . 0 drwxr-xr-x. 1 cool cool 44 Jul 17 11:48 .. 0 drwxr-xr-x. 1 cool cool 26 Jul 17 11:48 dev 0 drwxr-xr-x. 1 cool cool 128 Jul 17 11:48 etc 0 drwxr-xr-x. 1 cool cool 4 Jul 17 11:48 lo 0 drwxr-xr-x. 1 cool cool 10 Jul 17 11:48 opt 0 drwxr-xr-x. 1 cool cool 6 Jul 17 11:48 tmp 0 drwxr-xr-x. 1 cool cool 10 Jul 17 11:48 usr

/opt/cool/systemplate/etc: total 28 0 drwxr-xr-x. 1 cool cool 128 Jul 17 11:48 . 0 drwxr-xr-x. 1 cool cool 34 Jul 17 11:48 .. 0 -rw-r--r--. 1 cool cool 0 Jul 17 11:48 copied 4 -rw-r--r--. 2 cool cool 286 Jul 17 11:48 group 4 -rw-r--r--. 1 cool cool 298 Jul 17 11:48 hosts 4 -rw-r--r--. 2 cool cool 671 Jul 17 11:48 passwd 4 -rw-r--r--. 1 cool cool 80 Jul 17 11:48 resolv.conf 4 -rw-r--r--. 2 cool cool 639 Jul 2 05:48 nsswitch.conf 4 -rw-r--r--. 2 cool cool 114 Apr 4 00:00 localtime 4 -rw-r--r--. 2 cool cool 9 Nov 29 2023 host.conf

$ echo changeit >> /opt/cool/systemplate/etc/hosts

$ /start-collabora-online.sh `

Then when it detects the mismatch of /etc/hosts vs /opt/cool/systemplate//etc/hosts it is able to update systemplate and then carry on bind mounting that as normal so this problem doesn't arise.

frk-00024-00024 2024-07-18 11:58:52.367886 +0000 [ coolforkitns ] DBG  File contents mismatch: [/opt/cool/systemplate//etc/hosts] exists, 307 bytes, modified at 1721303896 =/= [/etc/hosts]: exists, 283 bytes, modified at 1721303778| common/FileUtil.hpp:301
frk-00024-00024 2024-07-18 11:58:52.367890 +0000 [ coolforkitns ] INF  File [/opt/cool/systemplate//etc/hosts] needs to be updated.| common/JailUtil.cpp:567
frk-00024-00024 2024-07-18 11:58:52.367893 +0000 [ coolforkitns ] INF  Linking [/etc/hosts] -> [/opt/cool/systemplate//etc/hosts].| common/JailUtil.cpp:570
frk-00024-00024 2024-07-18 11:58:52.367898 +0000 [ coolforkitns ] DBG  File contents mismatch: [/opt/cool/systemplate//etc/hosts] exists, 307 bytes, modified at 1721303896 =/= [/etc/hosts]: exists, 283 bytes, modified at 1721303778| common/FileUtil.hpp:301
frk-00024-00024 2024-07-18 11:58:52.367899 +0000 [ coolforkitns ] WRN  Failed to link [/etc/hosts] -> [/opt/cool/systemplate//etc/hosts] (File exists). Will copy and disable linking dynamic system files in this run.| common/JailUtil.cpp:591
frk-00024-00024 2024-07-18 11:58:52.367904 +0000 [ coolforkitns ] INF  Copying [/etc/hosts] -> [/opt/cool/systemplate//etc/hosts]| common/JailUtil.cpp:600
witchent commented 2 months ago

So I just executed your commands in the collabora image (though using your workaround with false mount_namespace, if that makes any difference): podman exec -it collabora /bin/bash

$ whoami
cool

$ ls -aslt /opt/cool/systemplate /opt/cool/systemplate/etc /opt/cool/systemplate/etc/hosts
4 -rw-r--r--  1 root root   77 Jul 17 20:50 /opt/cool/systemplate/etc/hosts

/opt/cool/systemplate/etc:
total 64
4 drwxr-xr-x  5 root root 4096 Jul 17 20:52 .
0 -rw-r--r--  1 root root    0 Jul 17 20:52 copied
8 -rw-r--r--  1 root root 6614 Jul 17 20:52 ld.so.cache
4 drwxr-xr-x 11 root root 4096 Jul 17 20:52 ..
4 drwxr-xr-x  4 root root 4096 Jul 17 20:52 fonts
4 -rw-r--r--  1 root root   34 Jul 17 20:52 ld.so.conf
4 drwxr-xr-x  2 root root 4096 Jul 17 20:52 ld.so.conf.d
4 drwxr-xr-x  3 root root 4096 Jul 17 20:52 ssl
4 -rw-r--r--  2 root root  446 Jul 17 20:52 group
4 -rw-r--r--  2 root root  883 Jul 17 20:52 passwd
4 -rw-r--r--  1 root root   77 Jul 17 20:50 hosts
4 -rw-r--r--  2 root root    8 Jul  1 00:00 timezone
4 -rw-r--r--  2 root root  494 Apr 10 07:01 nsswitch.conf
4 -rw-r--r--  1 root root  660 Mar  5 15:21 resolv.conf
4 -rw-r--r--  2 root root  114 Feb  3 18:56 localtime
4 -rw-r--r--  2 root root    9 Aug  7  2006 host.conf

/opt/cool/systemplate:
total 44
4 drwxr-xr-x  1 cool cool 4096 Jul 18 11:12 ..
4 drwxr-xr-x  2 root root 4096 Jul 17 20:52 dev
4 drwxr-xr-x  5 root root 4096 Jul 17 20:52 etc
4 drwxr-xr-x  2 root root 4096 Jul 17 20:52 lo
4 drwxr-xr-x 11 root root 4096 Jul 17 20:52 .
4 drwxr-xr-x  3 root root 4096 Jul 17 20:52 lib
4 drwxr-xr-x  2 root root 4096 Jul 17 20:52 lib64
4 drwxr-xr-x  2 root root 4096 Jul 17 20:52 opt
4 drwxr-xr-x  3 root root 4096 Jul 17 20:52 tmp
4 drwxr-xr-x  4 root root 4096 Jul 17 20:52 usr
4 drwxr-xr-x  3 root root 4096 Jul 17 20:52 var

I know that it already started collabora (I can check if it is different when I modify entrypoint etc), but as everything is owned by root in my image (straight from docker.io) I could image that the user just isn't allowed to do anything. This is a really wild guess and I don't know why that would be happening.

If I can do anything to help you debug this please let me know. Thanks a lot already for your time.

caolanm commented 2 months ago

Yeah, I think I see. I built my docker/podman images from source and that has an explicit step of

chown -R cool:cool /opt/ and these are built from the packages which take the /opt/cool/systemplate from the post-install step which generates that /opt/cool/systemplate as root

and the /etc/hosts will be out of date wrt /opt/cool/systemplate/etc/hosts in podman basically immediately

caolanm commented 2 months ago

So I believe that https://github.com/CollaboraOnline/online/commit/b65ded3f6b25f9d8a834ca5a698289ce34a58d65 solves this failure when bind mounting fails, and https://github.com/CollaboraOnline/online/pull/9572 should fix the reason that bind mounting itself is failing

witchent commented 1 month ago

Alright, thank you for investigating. I will have a look with the next release (after the patch is merged) and reply if it worked

vwbusguy commented 1 month ago

This is likely the use of mount namespaces by default in this CODE release

For what it's worth, this bit me on rootful podman as well. The only way I got it to work was adding CAP_SYS_ADM, which I very much do not think is the right workaround.

For what it's worth kernel.unprivileged_userns_clone doesn't exist in the stock Fedora 39/40 kernels.

caolanm commented 1 month ago

fixes merged now

vwbusguy commented 1 month ago

Can confirm that this fixed my problem in rootful podman as well. Thanks!