apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.06k stars 1.1k forks source link

systemVM stuck in starting state #3784

Closed DaanHoogland closed 3 years ago

DaanHoogland commented 4 years ago

When a host running system VM's ( SSVM/ Console Proxy / VR ) undergoes a planned or un-planned reboot, after the reboot, the system VM's get stuck in "Starting" state and do not start up properly causing new VM deployment issues. The behaviour was tested on version 4.13.

ISSUE TYPE
COMPONENT NAME
HA systemVMs
CLOUDSTACK VERSION
4.13.0
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS
andrijapanicsb commented 4 years ago

@DaanHoogland is this related to local storage, or in general?

DaanHoogland commented 4 years ago

@andrijapanicsb I'd have to test, i don't know.

Spaceman1984 commented 4 years ago

Tested with NFS, no error - Will test with local storage.

rohityadavcloud commented 4 years ago

Not seen recently.

Spaceman1984 commented 4 years ago

Tested with local storage, unable to reproduce.

andrijapanicsb commented 4 years ago

Did you try/test this by powering off (in parent lab) the hypervisor where the SSVM/CPVM are running (KVM, local storage)?

Spaceman1984 commented 4 years ago

I tested with restarting my Centos 7 KVM Hypervisor, 2 Different zones, 1 with local storage, one with NFS.

rohityadavcloud commented 4 years ago

@DaanHoogland are you able to reproduce it still on master?

DaanHoogland commented 4 years ago

tried on latest master (4.15 pre-release) a 2 kvm nested test env with centos7 hosts:

image

the host and systemVM show as up and connected but the console is not reachable and on the host virsh list does not show any VMs.

DaanHoogland commented 3 years ago

retsting again as it finds no traction, will close and open a new one if it costs any effort to reproduce.

DaanHoogland commented 3 years ago

tested on latest healthcheck build, stopped and started the host that contained cpvm out of bound. no problem found