Open visheshtanksale opened 4 months ago
cc: @zvonkok
@littlejawa is this something you're helping with or are you looking for reinforcements?
@haircommander Yes, he is helping with that, and we're currently out of options and need reinforcements.
what happens when you create the container with a different oci runtime?
what happens when you create the container with a different oci runtime?
Non kata containers are created successfully.
The symptom is similar to what we saw with kata 3.3.0, where the content of the container's rootfs was not accessible to the runtime. We fixed it in our own CI by adding the flag "storage.overlay.skip_mount_home=true" in crio's config. I'm also fixing it in the same way in the crio CI for kata, in https://github.com/cri-o/cri-o/pull/7958.
In this cluster the flag was not there, so we added it, but it didn't solve the problem. Could crio ignore the flag for some reason? What else could cause the same symptom?
After some experiments from my side, this is what I learned.
[crio]
storage_option = [
"overlay.skip_mount_home=true",
]
If ^^^ is set before kubernetes is deployed, we're good. If ^^^ is set after kuberentes is deployed, restarting cri-o / kubelet does not solve the issue, although a full reboot does.
I'm also added the same comment to the Kata Containers issue.
Hey @haircommander,
I think we need your brain here :)
Crio was taking our change into account (according to its logs), but kata still couldn't access the files from the container rootfs, meaning that the mount was still wrong. We managed to make the cluster work, by rebooting the node. Reloading / restarting crio multiple times didn't help.
Is it because the layers were already mounted with the wrong flag, and not updated as part of the reload/restart? If so, is there anything else we could have done to make them remounted properly?
Is rebooting the node the right way to make this setting applied ?
Is it because the layers were already mounted with the wrong flag, and not updated as part of the reload/restart? If so, is there anything else we could have done to make them remounted properly?
yeah that makes sense to me. I think the only way to fix it would be to remove the containers and images. Rebooting is probably least intrusive
I see two things here:
1) This issue is not about kata. I can't edit the title, but I think it should be something like : "Storage option changes in crio config requires a reboot to be taken into account"
2) Do we want to fix it? Removing all images/containers is not something that I expect CRI-O to do by itself on every reload/restart. Even if we limit it to this specific kind of configuration change (assuming we can tell that it's a new setting) it can be very impactful. On the other hand, being one of the guys who scratched their heads trying to understand what was going on, can we add some warning (maybe as comments in the conf file) to make sure people are aware they may need to reboot if they change it?
/retitle Storage option changes in CRI-O configuration requires a reboot to be taken into account
[...]
- This issue is not about kata. I can't edit the title, but I think it should be something like : "Storage option changes in crio config requires a reboot to be taken into account"
@littlejawa, this is a restart of the guest virtual machine, correct? I hope that the host on which CRI-O runs does not require that.
No, we're talking about the host unfortunately.
The problem is as follows:
The way to make it taken into account is to reboot the node. That's bad, but the alternative seems to be: remove all images/containers... so maybe rebooting is the lesser of two evils :-(
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
/remove-lifecycle-stale
A friendly reminder that this issue had no activity for 30 days.
/remove-lifecycle-stale
What happened?
Setup Kata using kata deploy on CRI-O. When creating a test pod I get error below
If I try to bring up any other container using kata-qemu runtime I get similar error that the command which is entrypoint of the container is not found
Attached crio log here Attached kata log here
Qemu and kata version are below
Opened an issue on kata-containers @littlejawa suggest adding the storage overlay config
But this doesnt help.
What did you expect to happen?
The pod should come up without error
How can we reproduce it (as minimally and precisely as possible)?
kata-qemu
runtime classAnything else we need to know?
No response
CRI-O and Kubernetes version
OS version
Additional environment details (AWS, VirtualBox, physical, etc.)