apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.98k stars 1.09k forks source link

Unable to enter maintenace mode periodically #9523

Open adietrich-ussignal opened 1 month ago

adietrich-ussignal commented 1 month ago
ISSUE TYPE
COMPONENT NAME
Maintenance Mode
CLOUDSTACK VERSION
4.19.1.0
CONFIGURATION

Advanced Networking Three KVM Hosts, two of the three hosts run the management servers and database (master/replica).

OS / ENVIRONMENT

Ubuntu 22.04

SUMMARY

When attempting to put hosts into maintenance mode, some instances migrate, but some still remain on the host until it reaches ErrorInMaintenance. When looking at the agent logs on the host that is being vacated, there appears to be no interaction it receives from the management server during these times when an instance is having issues migrating.

Management Server Log snippet:

image

STEPS TO REPRODUCE
 1. Enter a host into maintenance mode 
 2. Wait for status to change into a permanent state.
EXPECTED RESULTS
Host enters into maintenance.
ACTUAL RESULTS
Host enters into ErrorInMaintenance.
DaanHoogland commented 1 month ago

@adietrich-ussignal , can you try to migrate the remaining instances manually?

Also can you add a text version of that stacktrace to the ticket, please?

adietrich-ussignal commented 1 month ago

I migrated the remaining instances manually. Interestingly, this issue was reproduced on each of the hosts in my cluster. I then rebalanced the cluster to distribute instances across all hosts. I then re-attempted maintenance mode on each host which completed successfully. It makes me wonder what state the instances were in that could've prevented maintenance mode from working, but allowing individual live migrations to work without issues.