Open A6GibKm opened 2 weeks ago
The daily CI runs, linked from README.md
, include package-based Fedora 40, and they are passing.
I wonder if this is specific to Fedora Silverblue.
What happens if you do this on the host:
# touch /tmp/machine-id
# mount --rbind /etc/machine-id /tmp/machine-id
This is not my only silverblue 40 machine, the other seems to be able to run the toolbox just fine.
$ touch /tmp/machine-id
$ mount --rbind /etc/machine-id /tmp/machine-id
mount: /tmp/machine-id: must be superuser to use mount.
dmesg(1) may have more information after failed mount system call.
$ sudo mount --rbind /etc/machine-id /tmp/machine-id
$ toolbox enter
Error: failed to initialize container fedora-toolbox-40
I also tried the touch
with sudo.
For more context, this is the second time it happens this month. I recreated the the toolbox only a few days ago. I saw another report at one GNOME Matrix channel.
This is not my only silverblue 40 machine, the other seems to be able to run the toolbox just fine.
$ touch /tmp/machine-id $ mount --rbind /etc/machine-id /tmp/machine-id mount: /tmp/machine-id: must be superuser to use mount. dmesg(1) may have more information after failed mount system call. $ sudo mount --rbind /etc/machine-id /tmp/machine-id
The mount(8)
has to be done as root. That's why I used a #
prompt in my example.
Looking at the code, /etc/machine-id
is the first bind mount that the container's entry point attempts to do, and then there's this from the container's logs:
level=debug msg="Running as real user ID 0"
...
level=debug msg="Binding /etc/machine-id to /run/host/etc/machine-id"
mount: /etc/machine-id: must be superuser to use mount.
dmesg(1) may have more information after failed mount system call.
Those two things can't be true at the same time. So, I am beginning to wonder if there's something going wrong inside mount(8)
. It would be revealing to prepend a call to strace(1)
and then try it with a Toolbx container that includes the strace(1)
binary. Something like:
$ git diff
diff --git a/src/cmd/initContainer.go b/src/cmd/initContainer.go
index de7bcfcc5302..c6108edc4135 100644
--- a/src/cmd/initContainer.go
+++ b/src/cmd/initContainer.go
@@ -724,6 +724,7 @@ func mountBind(containerPath, source, flags string) error {
logrus.Debugf("Binding %s to %s", containerPath, source)
args := []string{
+ "mount",
"--rbind",
}
@@ -733,7 +734,7 @@ func mountBind(containerPath, source, flags string) error {
args = append(args, []string{source, containerPath}...)
- if err := shell.Run("mount", nil, nil, nil, args...); err != nil {
+ if err := shell.Run("strace", nil, nil, nil, args...); err != nil {
return fmt.Errorf("failed to bind %s to %s", containerPath, source)
}
However, you need Toolbx to build Toolbx on Fedora Silverblue. So, I suppose I should put together a debug RPM.
I just upgraded my other machine and its toolbox still works. This is very weird considering the machines are configured the same (afaik). If you prepare a rpm or binary I can try that thanks!
Just tried it with this Fedora 40 Silverblue deployment and couldn't reproduce:
Deployments:
● fedora:fedora/40/x86_64/silverblue
Version: 40.20240618.0 (2024-06-18T00:52:57Z)
BaseCommit: fa68d62df2fae64e52bbfe15784915c78ab2914767cacded8c5de2f5b7ddab62
GPGSignature: Valid signature by 115DF9AEF857853EE8445D0A0727707EA15B79CC
Just to be sure, do you have the same deployment on both your machines?
I submitted a Fedora 40 build for a debug RPM: https://koji.fedoraproject.org/koji/taskinfo?taskID=119303969
Nah, the one broken is yesterday's (40.20240618.0 (2024-06-18T00:52:57Z)) deployment and the other machine which is working has today's. I am upgrading right now but I don't think this is it.
Attached the output of
strace toolbox enter &> strace.txt
with the debug build. Is that enough?
EDIT: Note that the error is different this time? I still see
jun 19 20:47:28 alpha fedora-toolbox-40[8291]: Error: failed to bind /etc/machine-id to /run/host/etc/machine-id
in journalctl -b
.
Attached the output of
strace toolbox enter &> strace.txt
We don't need to run strace(1)
against toolbox enter
. For that we wouldn't need a debug build.
We are running strace(1)
against the mount(8)
getting called inside the container from the entry point by adjusting the toolbox(1)
binary. So we need to look at the strace(1)
output from podman start --attach
or podman logs
.
Sorry I am not sure how to get the strace from inside the container, you mean
$ strace podman start --attach fedora-toolbox-40 &> podman-attach.txt
? If so it is attached bellow.
No need to manually attach strace(1)
anywhere.
Before you install the debug build of toolbox
, ensure that you have a Toolbx container with strace(1)
in it.
Then, install the debug build of toolbox
, stop all your containers with podman stop --all
, then try to enter one with strace(1)
. If the error reproduces, then share with us what you have in podman start --attach ...
or podman logs ...
.
FWIW, I can reliably reproduce my toolbox containers breaking after doing a reboot
Maybe this is related? https://discussion.fedoraproject.org/t/rpm-ostree-update-breaks-toolbox-fedora-40/120095/4
I did a reset of my conifg. Here is the diff of the prior and newer output of podman system info
--- a 2024-06-19 23:28:25.883686898 +0200
+++ b 2024-06-19 23:28:38.536401465 +0200
@@ -13,17 +13,17 @@
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: '
cpuUtilization:
- idlePercent: 91.02
- systemPercent: 3.79
- userPercent: 5.2
+ idlePercent: 92.44
+ systemPercent: 3.63
+ userPercent: 3.93
cpus: 16
- databaseBackend: boltdb
+ databaseBackend: sqlite
distribution:
distribution: fedora
variant: silverblue
version: "40"
eventLogger: journald
- freeLocks: 2047
+ freeLocks: 2048
hostname: alpha
idMappings:
gidmap:
@@ -43,7 +43,7 @@
kernel: 6.9.4-200.fc40.x86_64
linkmode: dynamic
logDriver: journald
- memFree: 11161944064
+ memFree: 10079854592
memTotal: 16673759232
networkBackend: netavark
networkBackendInfo:
@@ -99,7 +99,7 @@
libseccomp: 2.5.5
swapFree: 8589930496
swapTotal: 8589930496
- uptime: 0h 2m 32.00s
+ uptime: 0h 2m 18.00s
variant: ""
plugins:
authorization: null
@@ -122,25 +122,25 @@
store:
configFile: /var/home/deathwish/.config/containers/storage.conf
containerStore:
- number: 1
+ number: 0
paused: 0
running: 0
- stopped: 1
+ stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/deathwish/.local/share/containers/storage
graphRootAllocated: 1000204886016
- graphRootUsed: 952381968384
+ graphRootUsed: 949989421056
graphStatus:
Backing Filesystem: btrfs
- Native Overlay Diff: "false"
+ Native Overlay Diff: "true"
Supports d_type: "true"
- Supports shifting: "true"
+ Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
- number: 1
+ number: 0
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /var/home/deathwish/.local/share/containers/storage/volumes
Maybe this is related? https://discussion.fedoraproject.org/t/rpm-ostree-update-breaks-toolbox-fedora-40/120095/4
I quickly skimmed through it. On the surface it doesn't seem related to why mount(8)
thinks that it's not running as root.
I did a reset of my conifg. Here is the diff of the prior and newer output of podman system info
Did resetting the Podman configuration reliably fix this problem?
FWIW, I can reliably reproduce my toolbox containers breaking after doing a reboot
Okay, that's great. Are you in a position to get the strace(1)
logs using the debug build, like I described above? If things are really badly broken, then I can come up with other steps. :)
Not at home atm, but no. I was not able to create new toolboxes. I will check in more detail later today
I was not able to create new toolboxes.
Why? What was the exact problem?
If you can't enter a container to install strace
, then you can create a custom image using a Container/Dockerfile like this:
FROM registry.fedoraproject.org/fedora:40
RUN dnf --assumeyes install strace
... followed by:
$ podman build --squash --tag localhost/strace-toolbox:40 /path/to/dir/with/Containerfile
Then you can create a container from this image:
$ toolbox create --image localhost/strace-toolbox:40
Then you can try to enter it with the debug toolbox
RPM above and see what shows up in podman start --attach
or podman logs
.
I was able to enter that container without any issues so there was nothing to strace :(. By the way, after removing the debug build of toolbox I am able to create and enter new toolboxes (After the podman system reset
).
For more context, this is the second time it happens this month. I recreated the the toolbox only a few days ago. I saw another report at one GNOME Matrix channel.
The exact same thing happened to me. Just re-built all the containers and now can't enter them again
I was not able to create new toolboxes.
Why? What was the exact problem?
If you can't enter a container to install
strace
, then you can create a custom image using a Container/Dockerfile like this:FROM registry.fedoraproject.org/fedora:40 RUN dnf --assumeyes install strace
... followed by:
$ podman build --squash --tag localhost/strace-toolbox:40 /path/to/dir/with/Containerfile
Then you can create a container from this image:
$ toolbox create --image localhost/strace-toolbox:40
Then you can try to enter it with the debug
toolbox
RPM above and see what shows up inpodman start --attach
orpodman logs
.
I tried to follow this, but newly created images work. It's just existing ones I can't enter.
For the last month it seems like the container images need to be rebuilt after every reboot on my Kinoite system
(Un)fortunately, I can't reproduce this anymore after doing a Silverblue update and resetting my containers as recommended in that previous link. So I can't really help with this anymore, but hey, at least things work again :-)
When starting my Fedora 40 toolbox (on Fedora Silverblue 40) I see the message:
See