balena-os / balenaos-in-container

Run balenaOS as a docker container
https://www.balena.io/os/
Apache License 2.0
49 stars 14 forks source link

Docker-compose up returns resin-init error #46

Open puccaso opened 2 years ago

puccaso commented 2 years ago

Hello.

I am running the container inside ubuntu, and the image although the image seems up, the system gets stuck at a loop between the supervisor starting up and starting libcontainer.

_1  | [  OK  ] Started DNS forwarder and DHCP server.
os_1  |          Starting Balena Application Container Engine...
os_1  |          Starting Resin proxy configuration service...
os_1  |          Starting Hostname Service...
os_1  | [FAILED] Failed to start Resin init service.
os_1  | See 'systemctl status resin-init.service' for details.
os_1  | [  OK  ] Started Hostname Service.
os_1  |          Starting Network Manager Script Dispatcher Service...
os_1  | [  OK  ] Started Network Manager Script Dispatcher Service.
os_1  | [  OK  ] Started Resin proxy configuration service.
os_1  | [  OK  ] Started Balena Application Container Engine.
os_1  |          Starting Balena supervisor...
os_1  |          Starting Load balena healthcheck image...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started Load balena healthcheck image.
os_1  | [  OK  ] Started libcontainer conta…745d4ae5626c43d75e9836f37bb74.
os_1  | [  OK  ] Started libcontainer conta…d7efc105230bb319508a1b5d078cf.
[  OK  ] Stopped Balena supervisor.
os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.
os_1  | [  OK  ] Started libcontainer conta…2c709edb0797d08d637bf7c9fea05.
[  OK  ] Stopped Balena supervisor.
os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started libcontainer conta…9e1c648e573cba083734aaf754cbe.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.

then I get other errors before the process seems to complete.

os_1  |          Starting Balena supervisor...
os_1  | [  OK  ] Started Balena supervisor.
os_1  | [  OK  ] Started libcontainer conta…413b0ee5ee749cd38ab3a2a45ab2f.
os_1  | [  OK  ] Stopped OpenVPN.
os_1  |          Starting Prepare OpenVPN...
os_1  | [  OK  ] Started Prepare OpenVPN.
os_1  | [  OK  ] Started OpenVPN.
os_1  | [ TIME ] Timed out waiting for device /dev/zram0.
os_1  | [DEPEND] Dependency failed for Enab…sed swap in memory using zram.
os_1  | [ TIME ] Timed out waiting for device /dev/ttyS0.
os_1  | [DEPEND] Dependency failed for Serial Getty on ttyS0.
os_1  | [  OK  ] Reached target Login Prompts.
os_1  | [  OK  ] Reached target Multi-User System.
os_1  |          Starting Update UTMP about System Runlevel Changes...
os_1  | [  OK  ] Started Update UTMP about System Runlevel Changes.

I can get to the console of the container, but i cant seem to see the system on the cloud dashboard.

Mem: 2234672K used, 1420568K free, 70948K shrd, 155736K buff, 1072308K cached
CPU:   7% usr   3% sys   0% nic  89% idle   0% io   0% irq   0% sirq
Load average: 0.59 0.51 0.51 5/613 3381
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
  275   242 root     S     867m  24%   0% balena-engine-containerd --config /var/run/balena-engine/containerd/
 3278  3243 root     R     4248   0%   0% top
  242     1 root     S     939m  26%   0% /usr/bin/balenad --experimental --log-driver=journald -s overlay2 -H
 1210     1 root     S     721m  20%   0% {runc:[2:INIT]} balena-engine-runc init
  696     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
  796     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 1415     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 1522     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 2143     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init
 2947     1 root     S     651m  18%   0% {runc:[2:INIT]} balena-engine-runc init

any ideas?

puc

puccaso commented 2 years ago

i worked out that the resin-init-service error has something to do with resin-init-board not completing successfully

lsblk: /dev/sda3[/var/lib/docker/volumes/balenaos-in-container_boot/_data]: not a block device

when i goto the volume dir in var/lib, i can see the config.json in the boots _data dir i know thats working.

i've tried configs for both generic x86 and intel nuc. i can't seem to progress beyond this point.

klutchell commented 2 years ago

This is the snippet of code that is failing:

# make sure the bootstrap code (boot.img) is removed in case we are using EFI boot
if [ -d /sys/firmware/efi ] ; then
    device="/dev/"$(findmnt --noheadings --canonicalize --output SOURCE /mnt/boot/ | xargs lsblk -no pkname)
    dd if=/dev/zero of=$device bs=446 count=1
fi

https://github.com/balena-os/balena-intel/blob/master/layers/meta-balena-genericx86/recipes-support/resin-init/resin-init-board/resin-init-board

Still investigating the proper workaround for running in a container.

jellyfish-bot commented 2 years ago

[klutchell] This issue has attached support thread https://jel.ly.fish/5c6f6bf4-0bc4-4f2f-b97e-eb66ee51573e

zumby commented 2 years ago

@puccaso Hi. Did you manage to run balenaos-in-container on Linux server with docker somehow?

I'm struggling with these errors still. Perhaps this has something to do with the way aufs or overlay is configured on Host server, or the cgroups configuration

balenaos-in-container-os-1  | Failed to attach 21 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/dev-hugepages.mount: No such file or directory
balenaos-in-container-os-1  |          Mounting Huge Pages File System...
balenaos-in-container-os-1  | Failed to attach 21 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/dev-hugepages.mount: No such file or directory
balenaos-in-container-os-1  | Failed to attach 22 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/sys-kernel-debug.mount: No such file or directory
balenaos-in-container-os-1  |          Mounting Kernel Debug File System...
balenaos-in-container-os-1  | Failed to attach 22 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/sys-kernel-debug.mount: No such file or directory
balenaos-in-container-os-1  | Failed to attach 23 to compat systemd cgroup /docker/74a6d832a885e725ac07fdda777a5eac78e6da246d501ed50225503acd3b6a24/system.slice/kmod-static-nodes.service: No such file or directory
balenaos-in-container-os-1  |          Starting Resin NTP server configure service...
balenaos-in-container-os-1  |          Starting DNS forwarder and DHCP server...
balenaos-in-container-os-1  | [  OK  ] Started DNS forwarder and DHCP server.
balenaos-in-container-os-1  | [  OK  ] Started Resin NTP server configure service.
[ TIME ] Timed out waiting for device /dev/zram0.
balenaos-in-container-os-1  | [DEPEND] Dependency failed for Enab…sed swap in memory using zram.

@klutchell maybe you have some advice

klutchell commented 2 years ago

@zumby have you checked to make sure you are using cgroups v1 and not v2 as per the readme? https://unix.stackexchange.com/questions/619681/how-can-i-find-out-what-version-of-cgroups-i-have

Does the behaviour change if you use a different OS release? https://github.com/balena-os/balenaos-in-container/blob/master/docker-compose.yml#L9

zumby commented 2 years ago

@klutchell thanks for response

FYI - this is a simple EC2 on AWS with Amazon Linux on board (x86_64)

  1. cgroups seems to be fine, but im not that expert:

    image
  2. As for the images, i've tried the default (from docker-compose.yml) which is 2.95.12_rev1-genericx86-64-ext Also tried these:

    • latest one - 2.98.33-genericx86-64-ext
    • intel-nuc latest cone - 2.98.33-intel-nuc

And today I've tried the very fresh one 2.99.27_rev2-genericx86-64-ext and 2.99.27_rev2-intel-nuc

FYI I've built them like that:

docker-compose build --build-arg OS_VERSION=2.99.27_rev2

The result for all of them is the same: WARNINGS + ERROR + STUCK in the end.

image image

In the BalenaCloud, it does add the device but it always has check_localdisk issues:

image image

Generally, I want to achieve a device "simulation" or a "virtual" device that I can add to Balena. So the plan was to start this on EC2 in AWS and once it's in Balena - do other stuff with it, like deploy actual app. There is a slight chance it has something to do with the virtualisation at AWS EC2.

klutchell commented 2 years ago

@zumby it looks to me like your device is booted, and you can ignore those warnings. They are expected when running balenaos-in-container due to the way systemd handles pids.

I would not expect your root partition to fully expand since it's a virtual docker volume, so our partition utilities cannot determine the maximum partition size. This is also expected.

Is there anything blocking you from using this device as a simulation? You could also flash the genericx86-64-ext image directly to an AWS instance if you want to skip the extra layer of virtualization. Maybe this forum post can help get you started: https://forums.balena.io/t/host-balena-os-on-aws-as-a-virtual-device/304444

zumby commented 2 years ago

@klutchell interesting. i actually never moved forward yet (hehe) - but i will surely try to do other things and see if it blocks me. As for putting image directly into AWS - that seems harder to me and yet not too clear. But we'll see.

Thanks for the help and i'll comment more if i notice some troubles