balena-io / open-balena

Open source software to manage connected IoT devices at scale
https://balena.io/open
GNU Affero General Public License v3.0
1.07k stars 166 forks source link

Services crash without log. #99

Closed simon-zumbrunnen closed 4 months ago

simon-zumbrunnen commented 3 years ago

Because I use Traefik for all of my services, I can't use the quickstart, so I'm trying to deploy open-balena using my own docker-compose.yml. One problem I ran into is, that your services don't show any logs when they crash. All I see is:

Systemd init system enabled.

But since it crashed I can't exec into the container to look at the log files. For the api I have solved the problem by creating my own Dockerfile without systemd. For almost all others I used the original image instead of the balena one (e.g. registry:2 or postgres). But for the vpn service this isn't that easy.

Do you have any guidance on how to debug this?

gabrielepmattia commented 2 years ago

Same issue here, on Ubuntu 21.10 but I'm using ./scripts/compose up, i get

openbalena-s3-1             | Systemd init system enabled.
openbalena-s3-1 exited with code 255
openbalena-api-1            | Systemd init system enabled.
openbalena-registry-1       | Systemd init system enabled.
openbalena-api-1 exited with code 255
openbalena-registry-1 exited with code 255
openbalena-vpn-1            | Systemd init system enabled.
openbalena-vpn-1 exited with code 255
dfunckt commented 2 years ago

Most services use systemd. You can get logs by running ./scripts/compose exec -it <service> journalctl -fn100.

simon-zumbrunnen commented 2 years ago

Yeah but as I said, because the container is not running (crashed) I can't use exec.

gabrielepmattia commented 2 years ago

@seimsel on which OS are you trying to run open-balena? Did you run the containers as privileged (see below)

However, after different tests, it seems that the problem is in the image https://github.com/balena-io-modules/open-balena-base if you try to start the image from Ubuntu 18.04 then it works, otherwise in other distros like Ubuntu 21.XX or Fedora 33 the final container entrypoint exec /sbin/init crashes. The command that I used is the following:

docker run --privileged -it -v /sys/fs/cgroup:/sys/fs/cgroup:ro balena/open-balena-base

If it works (i.e. the container starts) then you can run open-balena. Remember to run the container in privileged mode and to attach the cgroup folder since it spawns multiple processes by using the init process as container entry point.

On Fedora 33/Ubuntu 21.10 I get

Systemd init system enabled.
systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <261f1fa6b3af>.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
simon-zumbrunnen commented 2 years ago

@seimsel on which OS are you trying to run open-balena? Did you run the containers as privileged (see below)

However, after different tests, it seems that the problem is in the image https://github.com/balena-io-modules/open-balena-base if you try to start the image from Ubuntu 18.04 then it works, otherwise in other distros like Ubuntu 21.XX or Fedora 33 the final container entrypoint exec /sbin/init crashes. The command that I used is the following:

docker run --privileged -it -v /sys/fs/cgroup:/sys/fs/cgroup:ro balena/open-balena-base

If it works (i.e. the container starts) then you can run open-balena. Remember to run the container in privileged mode and to attach the cgroup folder since it spawns multiple processes by using the init process as container entry point.

On Fedora 33/Ubuntu 21.10 I get

Systemd init system enabled.
systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <261f1fa6b3af>.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Thank you for your help. I was using Ubuntu 18.04 and I probably didn't run it in privileged mode. But my question isn't: "what do I have to do to make it work", but: "how do I figure out why it failed".

markdegrootnl commented 2 years ago

This problem occurs when your host-os uses cgroups2 exclusively and no cgroup v1. The balena containers want to start systemd inside the container but this is not possible with just cgroups2. To make your host-os use cgroup2 and 1 together run:

echo 'GRUB_CMDLINE_LINUX=systemd.unified_cgroup_hierarchy=false' > /etc/default/grub.d/cgroup.cfg
update-grub

and restart.

ab77 commented 4 months ago

All logs now go to stdout, please upgrade to the latest version and/or patch your existing compositions.