Suggestion: When Global Setting such as (network.loadbalancer.haproxy.max.conn) is changed, mark VR as 'Requires Upgrade' instead of marking it as failed healtcheck. #9800
One of our customers required larger HA Proxy Max Connections as they have many users connecting at the same time.
So, we change the default value of the below parameter in Global Settings to a new one:
network.loadbalancer.haproxy.max.conn = 500,000 (Previous is 4096, which was the default value)
Once implemented, and we restarted the cloudstack server, we got a whole bunch of healthcheck failures.
Screenshot below:
In this case, I dont think this should be counted as a healthcheck issue. Because the service seems to be working fine.
I think what would be a better experience for the operator, is to mark the router as 'Requires Upgrade'.
Because the VR does not need to be re-created. It just needed to be forced rebooted. (FYI, normal reboot doesnt seem to cause the VR to load the new maxconn value).
And as an operator, we rely on the 'Alert' section to ensure all customer VR are working normally. This current behavior creates alot of noise.
Even better, is for each customer to be able set their own (network.loadbalancer.haproxy.max.conn) value, and additional settings. Because not all customers requires such large values.
STEPS TO REPRODUCE
Refer above
EXPECTED RESULTS
Mark the router as 'Requires Upgrade', when a Global Setting is changed, such as network.loadbalancer.haproxy.max.conn
ACTUAL RESULTS
Bombarded with Health Check fails for all VRs created, which requires manual force reboot or cleanup VR. (normal reboot doesnt work).
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
SUMMARY
One of our customers required larger HA Proxy Max Connections as they have many users connecting at the same time.
So, we change the default value of the below parameter in Global Settings to a new one:
Once implemented, and we restarted the cloudstack server, we got a whole bunch of healthcheck failures.
Screenshot below:
In this case, I dont think this should be counted as a healthcheck issue. Because the service seems to be working fine.
I think what would be a better experience for the operator, is to mark the router as 'Requires Upgrade'.
Because the VR does not need to be re-created. It just needed to be forced rebooted. (FYI, normal reboot doesnt seem to cause the VR to load the new maxconn value).
And as an operator, we rely on the 'Alert' section to ensure all customer VR are working normally. This current behavior creates alot of noise.
Even better, is for each customer to be able set their own (network.loadbalancer.haproxy.max.conn) value, and additional settings. Because not all customers requires such large values.
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS