apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.06k stars 1.1k forks source link

SSVM not starting correctly #7154

Closed jaghabalayev closed 1 year ago

jaghabalayev commented 1 year ago
ISSUE TYPE
COMPONENT NAME

System VM



CLOUDSTACK VERSION
4.17.2.0

##### OS / ENVIRONMENT
Cloudstack management server: CentOS 8.0
Hosts: CentOS 7.0

##### SUMMARY
I used 2 KVM hosts, primary storage NFS and secondary storage is NFS. System VMs deployed correctly. After shutting down both KVM hosts at the same time and turn on back, system VMs stucked. I tried to reboot management server, however no success
<!-- Explain the problem/feature briefly -->

##### STEPS TO REPRODUCE
1. Primary storage NFS
2. Secondary storage NFS
3. 2 KVM hosts
4. Both of KVM hosts shutted down at same time and turn on
5. Reboot management server
6. Destroy SSVM
7. Trying to access SSVM - no success. The link-local IP is not pingable

##### EXPECTED RESULTS
SSVM nodes starts and agent state up

##### ACTUAL RESULTS
SSVM nodes stucked in Starting state, after management server reboot SSVM nodes in Running state, however Agent State in down. After destroing SSVM it stuck in Starting state
[management-and-agent-log.zip](https://github.com/apache/cloudstack/files/10551779/management-and-agent-log.zip)

From logs i noticed:

Can't get vm state r-5-VMDomain not found: no domain with matching name 'r-5-VM'retry:2

Groovy script '/etc/cloudstack/agent/hooks/libvirt-vm-xml-transformer.groovy' is not available. Transformations will not be applied.

Full log from both management server and kvm hosts agent attached

From the KVM host:
7: cloud0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:27:ba:d1:47:4d brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.1/16 scope global cloud0
       valid_lft forever preferred_lft forever

[root@kvm-test agent]# virsh domiflist v-23-VM
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet0      bridge     cloud0     virtio      0e:00:a9:fe:cb:68
vnet3      bridge     cloudbr0   virtio      1e:00:df:00:00:06
vnet4      bridge     cloudbr1   virtio      1e:00:30:00:00:20

The private and public IP is reachable, however Link-local IP is not reachable

From DB:

 select * from op_dc_link_local_ip_address_alloc where ip_address="169.254.203.104"  limit 10;
+-------+-----------------+----------------+--------+--------+--------------------------------------+---------------------+
| id    | ip_address      | data_center_id | pod_id | nic_id | reservation_id                       | taken               |
+-------+-----------------+----------------+--------+--------+--------------------------------------+---------------------+
| 52071 | 169.254.203.104 |              1 |      1 |     48 | 690d7efd-58ca-4d67-b2fa-0d2c1491359d | 2023-01-31 23:33:03 |
+-------+-----------------+----------------+--------+--------+--------------------------------------+---------------------+
1 row in set (0.08 sec)

Appreciate your support
jaghabalayev commented 1 year ago

firewalld.service disabled on both management and agent host. SELINUX=disabled also

weizhouapache commented 1 year ago

@jaghabalayev is the CPVM running well ? Can you share the agent.log on kvm host ?

jaghabalayev commented 1 year ago

@weizhouapache CPVM not started. the agent log attached agent.log

jaghabalayev commented 1 year ago

manual enter ip link set cloud0 up command on KVM host resolve the issue. Look like after rebooting both KVM node at the same time it not bring up the cloud0 interface up automatically. @weizhouapache

weizhouapache commented 1 year ago

thanks for the update @jaghabalayev good to hear it

if so, check your network config and bring cloud0 automatically in boot