NebraLtd / helium-miner-software

Software for Nebra (and third party) Helium Miners
https://nebra.io/hnt
MIT License
93 stars 48 forks source link

balena procfs and sysfs labels do not work correctly / recursively #281

Open shawaj opened 2 years ago

shawaj commented 2 years ago

According to this page https://www.balena.io/docs/learn/develop/multicontainer/

io.balena.features.sysfs | false | Bind mounts the host OS /sys into the container.
io.balena.features.procfs | false | Bind mounts the host OS /proc into the container.

However this does not seem to work recursively. As /sys/firmware/devicetree/base/serial-number and /proc/device-tree/serial-number can't be accessed from within a container with these labels applied.

We need to get balena to fix this urgently.

vpetersson commented 2 years ago

Maybe double check so that none of these are symlinks first.

shawaj commented 2 years ago

Maybe double check so that none of these are symlinks first.

@vpetersson /proc/device-tree/serial-number is a symlink to /sys/firmware/devicetree/base/serial-number but neither work

lrwxrwxrwx 1 root root 29 Nov 28 14:38 /proc/device-tree -> /sys/firmware/devicetree/base
vpetersson commented 2 years ago

Great. So that's why we need both of them enabled.

shawaj commented 2 years ago

@vpetersson we have both enabled. It doesn't work.

https://github.com/NebraLtd/helium-miner-software/blob/556929dfd3fdde2fab47c8b2eeefc077a2238641/docker-compose.yml#L75-L78

It seems to be an issue with balena - the /proc/ and /sys/ are mounted but not recursively

shawaj commented 2 years ago

Ref https://nebraltd.slack.com/archives/C024BNQ1Y6T/p1638110464351000?thread_ts=1638110464.351000&cid=C024BNQ1Y6T

vpetersson commented 2 years ago

I did some poking at this as well. So in theory, we shouldn't need io.balena.features.procfs: 1 at all, given that we can just change the reference to use /sys/firmware/devicetree/base (and thus making io.balena.features.sysfs: 1 sufficient).

That said, when I tried this on one of the testnet devices, i was unable to access this:

# ls -l /sys/firmware/devicetree/base
ls: cannot access '/sys/firmware/devicetree/base': No such file or directory

This is despite having the right permission:

      io.balena.features.sysfs: 1

Just to rule out that there is an issue with additional symlinks, i checked this recursively too:

root@b3e62a6:~# ls -l /sys/firmware/devicetree/base | grep serial-number
-r--r--r--  1 root root 17 Nov 29 09:40 serial-number
root@b3e62a6:~# ls -l /sys/firmware/devicetree | grep base
drwxr-xr-x 22 root root 0 Nov 29 09:40 base
root@b3e62a6:~# ls -l /sys/firmware | grep devicetree
drwxr-xr-x 3 root root     0 Nov 29 09:40 devicetree
root@b3e62a6:~# ls -l /sys | grep firmware
drwxr-xr-x   3 root root 0 Nov 29 09:40 firmware

Diving a bit further into this, I also inspected the container:

[
    {
        "Id": "sha256:67779341a4a0fa764a6da662e1a836b94809743a099464373dbf737cec713893",
        "RepoTags": [
            "registry2.balena-cloud.com/v2/6ff5b5e7bd6e40f96cc8973f2822f059:latest"
        ],
        "RepoDigests": [],
        "Parent": "",
        "Comment": "buildkit.dockerfile.v0",
        "Created": "2021-11-26T18:13:32.11402468Z",
        "Container": "",
        "ContainerConfig": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": null,
            "Cmd": null,
            "Image": "",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "DockerVersion": "",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/opt/python-dependencies/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LC_ALL=C.UTF-8",
                "DEBIAN_FRONTEND=noninteractive",
                "UDEV=off",
                "QEMU_CPU=arm1176",
                "LANG=C.UTF-8",
                "PYTHON_VERSION=3.10.0",
                "PYTHON_PIP_VERSION=21.2.4",
                "SETUPTOOLS_VERSION=58.0.0",
                "PYTHONPATH=/opt/python-dependencies:/usr/lib/python3/dist-packages:",
                "PYTHON_DEPENDENCIES_DIR=/opt/python-dependencies"
            ],
            "Cmd": null,
            "Healthcheck": {
                "Test": [
                    "CMD-SHELL",
                    "wget -q -O - http://0.0.0.0:5000/initFile.txt || exit 1"
                ],
                "Interval": 120000000000,
                "Timeout": 5000000000,
                "StartPeriod": 15000000000,
                "Retries": 10
            },
            "Image": "",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": [
                "gunicorn",
                "--bind",
                "0.0.0.0:5000",
                "hw_diag:wsgi_app"
            ],
            "OnBuild": null,
            "Labels": {
                "io.balena.architecture": "rpi",
                "io.balena.device-type": "raspberry-pi",
                "io.balena.qemu.version": "6.0.0+balena1-arm",
                "org.opencontainers.image.created": "2021-11-26T18:02:04.346Z",
                "org.opencontainers.image.description": "Helium Miner Diagnostics",
                "org.opencontainers.image.licenses": "MIT",
                "org.opencontainers.image.revision": "625ea724b9ffd439efd3380a6f5fd319cc796062",
                "org.opencontainers.image.source": "https://github.com/NebraLtd/hm-diag",
                "org.opencontainers.image.title": "hm-diag",
                "org.opencontainers.image.url": "https://github.com/NebraLtd/hm-diag",
                "org.opencontainers.image.version": "625ea72"
            }
        },
        "Architecture": "arm",
        "Os": "linux",
        "Size": 323688792,
        "VirtualSize": 323688792,
        "GraphDriver": {
            "Data": null,
            "Name": "aufs"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:f6c4b8ba72c895e3050426fea46ab4d6724e7041ead5606ca48322e5def811f0",
                "sha256:333da77db0fc9525c42c771cdbde01456d1c52c1922f6b13f72a1425426913fb",
                "sha256:4498be8ebc1c75b6fe9d7a78afd2c2e8ada21d9ede413de8bd8ea3bbb8f9e217",
                "sha256:98c6b47458cac205d3355208cbae6fc67d47b1e2bc719b0da156c678fa90be05",
                "sha256:665bdbf58a594d2dd66aa17a9c0613a18753b67e5f3e4ae352338e0cc712fe6a",
                "sha256:e75a1e189bcd9b1c58234e4aac14538e53e8f46eca27d34a21f09a3e755e818f",
                "sha256:f3490bc23220c5a72aaa892990a50954960586e4d016359155035cb84158dbbf",
                "sha256:dcbb18d7198db0f1f5362df3926bf3c548889128c3a798aab160d65a1e6704a7",
                "sha256:57b1a65a65f064fd3af8b73881f07b0a42f9ef703bcd7f0da3948e5a629f2cea",
                "sha256:cce8a8aae530a7fc3c19e1b66096b8d2a2ddea7eae2869d8bf761435d1132b4b",
                "sha256:b42fa571461c53563c4f6da740598e30eb069f856692ef270ed5c4db2da525c9",
                "sha256:e87594a350c72729f8af0581dbd792deb2993a5b96a2ad8d7e04e8759133661a",
                "sha256:751eef3794ea05ec54ce5ab595bde3648034a1343ad7aad53697f64960bfa882",
                "sha256:f688b61ab5294a926b8ccfb89cb1d9f764f5d3db9642a1d2e772383499b0419d",
                "sha256:83d1971b7835f9093645bae1ae1100d5841bdd322ea74ac38a2116b06d5592fe",
                "sha256:84ec98be4f8fab9db8d680d35ee421f1928a7dc6e1917b5b0aafe0ddcfe19bf1",
                "sha256:ed9052c06c370b671dbd324e40dd4880390c72a2fc55b2ca00e6221b56bb6923",
                "sha256:711706b31e19c318d8c8b25dc7d264f24adcf58c56daadf2ea4126358d51dd68",
                "sha256:a27b9772d3d6a8e22f752de7accd2abf044d5e8c389fa7fae57218133d582506",
                "sha256:5a598657742b8f2c219b09f6ee34214f74655667022c0635a0c087bac392ea08",
                "sha256:ec4b4a3107b124d28251af2c78c1484a0fe1689e4f064043716cb9ad1006dfe5",
                "sha256:925a9eaaf436eb9ef3f35a5bcc085d395830da6b84011f059b0362da5a0076fd",
                "sha256:b20f09665c5098890be657d1bfaffba521473809e3debd10eef61a997e598d40",
                "sha256:5e3de7c66e5c152972210d680ae3de3c0abbc53e519410bb388f7fd56545f47f",
                "sha256:1f8f62e13038c344c28d63566379f6dcd51c45ce2453264f22193643ec9c53b4",
                "sha256:8f1e95d0df8fe546259d30f83e7bf5fcdcb3bdc93ac4c17500348f8add1df2d0",
                "sha256:dd586fa2a8e4cfb5c89e7b8675bcfad3b4842f0494c483d7217b4caacd860d29",
                "sha256:0c9f49e1fd5c3c0da2c7bd078d7bf4d886e1640e2c1b42afba4b89f4e9e255ab",
                "sha256:d86acff63443128506500dda86d328c03ae455ddd9fe94472945a69950edbdac",
                "sha256:f25ff49071ffb55094e8561bad3f958deac38e4f9d5acdf4e8e843c1d9dcac22",
                "sha256:33db38942240ee573b077d8e558e68c037caed704401387c4f81d40e6e54529c",
                "sha256:3af9c0b5ab3c58ae3625b6ac44433455ce473230f33d7e3ae07d15dd0df6022c"
            ]
        },
        "Metadata": {
            "LastTagTime": "2021-11-28T19:10:59.865120619Z"
        }
    }
]

What's noteworthy in the above dump is the following:

I think we can write off both of the above observations as simply U/X issues in balena-engine rather than root causes since we can in fact see them.

The next step here would be to raise this with Balena support I think.

shawaj commented 2 years ago

Have sent ticket to balena already but will send them a look to your findings as well as that's very useful

shawaj commented 2 years ago

FYI

https://github.com/balena-os/balena-supervisor/blob/27013b1d72f5cf66b0ae928b7e7e66bb34d7035d/src/compose/utils.ts#L396-L398

And

https://github.com/balena-os/balena-supervisor/blob/e8e441bea342f0f9aa98ae058732770fd9b9ff78/src/lib/system-info.ts#L70-L78

vpetersson commented 2 years ago

FYI

https://github.com/balena-os/balena-supervisor/blob/27013b1d72f5cf66b0ae928b7e7e66bb34d7035d/src/compose/utils.ts#L396-L398

Looks sensible at first glance.

shawaj commented 2 years ago

See below reply from balena:

Hi Aaron,

I tried replicating the problem on my device and saw that setting the

io.balena.features.procfs label does bind mount

/proc to the container's

/proc path. The same goes for the

io.balena.features.sysfs label for bind mounting

/sys to the containers

/sys path. The command

balena inspect should help you confirm if the host paths are indeed mounted to the containers.

However, without setting

privileged: true for the service, the following directories are masked:

"MaskedPaths": [ "/proc/asound", "/proc/acpi", "/proc/kcore", "/proc/keys", "/proc/latency_stats", "/proc/timer_list", "/proc/timer_stats", "/proc/sched_debug", "/proc/scsi", "/sys/firmware" ], Also, the following directories are set to read-only when the service is not privileged:

"ReadonlyPaths": [ "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ] Aside from accessing the serial number of the device from the container, what other required functionalities are not being provided by enabling the

io.balena.features.procfs &

io.balena.features.sysfs labels?

I just saw the link you provided and it is strange that the container inspection does not have any mounts and I don't see the

io.balena.features.sysfs label in the list of labels of the container. Can you give us access to a test device exhibiting this behavior so we can do a deeper investigation? Does this issue only occur for devices with the following details?

Device type: Raspberry Pi 3 (using 64bit OS) OS version: balenaOS 2021.10.2 Supervisor version: 12.11.2 Regards, Carlo

shawaj commented 2 years ago

seems like this is intended functionality by balena to not expose /sys/firmware/devicetree/base/serial-number and /proc/device-tree/serial-number into the container.

for now, we have just used privileged: true in the diagnostics container to allow these to be mapped correctly.

@vpetersson @marvinmarnold do we want to keep this open for future reference at all? or do we want to continue to push balena to expose these? or are we happy to close?