Closed vosdev closed 2 years ago
For now, can I safely remove the state file to get my VMs/containers to restore using the lxc cluster restore
command?
Right, so your stateful migration never failed, instead it was stuck on I/O due to lack of disk space. Then on restart, the instance config disk is so full that LXD can't actually start the instance back up.
Your best bet is to do lxc config device set INSTANCE root size.state 8GiB
or something along those lines which will allow for enough space for stateful stop/snapshots/migration and also fix your current startup problem.
That is a critical oversight on my part :-). It even says so on your newspost! https://discuss.linuxcontainers.org/t/lxd-4-12-has-been-released/10424
Can I request we get an error + automatic clean-up for when this happens instead of a stuck process + a no space left on device at next restart of the VM? Or an initial "If memory > diskspace" then do not even attempt to transfer state
I just changed the size.state device and the VM works now
~
root @ node1 # lxc config device override k8s-dev root
Device root overridden for k8s-dev
~
root @ node1 # lxc config device set k8s-dev root size.state 8GiB
Cheers :-). Now I can also start properly testing/using the live migration feature!
edit: The information was only mentioned in the release notes of 4.12, not 4.20. I have also been unable to find it on the docs
Yeah, I think it'd be reasonable for us to refuse performing stateful stop/snapshots/migration unless size.state
is >= size
+ limits.memory
.
Yes that would make it more user friendly to discover that setting.
I think we should do that on startup instead of during config validation as this needs to check:
Mixing that in with profiles and the like could cause a lot of config update failures, so probably best to just fail startup by validating this in Start()
of driver_qemu.go
Required information
Issue description
I evacuated a node for a reboot and restored it after it came back online.
the VM k8s-dev lives on ceph.
As you can see, the config image on ceph is 100% full because of the ./state file
Additional information:
this VM,
k8s-dev
, is the first VM on this node. 3 other instances were containers and started without issue.Another VM that lives on this host+ceph that has not started yet only uses 12% of the config image on ceph and has no
./state
fileThe difference between the two VMs is that
k8s-dev
hasmigration.stateful: "true"
in it's config and the other VMs/containers do not.I used this VM as a test for the new stateful migration feature, but never got it to work. The command would just wait indefinitely. I forgot about it until now. The state file is from November 17th, a little after VM live migration came available and I started testing it.
Is LXD writing the VMs memory to the ceph config image to transfer it to another host? If so, then a 100mb quota isn't going to be enough?
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)The last entry of
/var/snap/lxd/common/lxd/logs/lxd.log
is that it succesfully started the previous container from thelxc cluster recover
action, so not relevant.