apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.11k stars 1.11k forks source link

VM console cannot access after VNC certificate expired #9718

Closed havengit closed 2 months ago

havengit commented 2 months ago
ISSUE TYPE
COMPONENT NAME
CPVM
CLOUDSTACK VERSION
4.18 
CONFIGURATION

advanced networking

OS / ENVIRONMENT

Hypervisro: KVM virtualization with ubuntu 22.04

SUMMARY

VM console cannot access after VNC certificate expired。 Default certificate valid for 1 year ,then automatically renews certificates when they expire by cloudstack。 I made sure the certificate was updated on the host,but Th running VM instances need stop and start to apply the new certificate or migraton to other host. For production environments, this is hard to do

I don't know how to get a new certificate to take effect without rebooting the vm

https://github.com/apache/cloudstack/pull/7015

STEPS TO REPRODUCE
EXPECTED RESULTS
Make a new certificate to take effect without rebooting the vm
Provide a setting to turn off tls for vnc
ACTUAL RESULTS
DaanHoogland commented 2 months ago

@havengit , I think this is more a lack of feature than a bug, which admittedly comes down to the same to you. So I will mark it as an improvement. Dynamically setting the certificate will not be easy. Did you try rebooting just the console proxy? I would guess that is the actually needed action.

weizhouapache commented 2 months ago

there are some global settings ("ca.framework.*"), including

if auto renewal is enabled, the agent cert should be auto-renewed.

otherwise you can do it manually . refer to https://github.com/apache/cloudstack/issues/9562#issuecomment-2302208986

havengit commented 2 months ago

Yes , I have restarted cpvm and libvirt on host. I confirm that the certificate has been successfully updated on the host. May be same bug in qemu. After stop vm and start vm console works fine .

weizhouapache commented 2 months ago

Yes , I have restarted cpvm and libvirt on host. I confirm that the certificate has been successfully updated on the host. May be same bug in qemu. After stop vm and start vm console works fine .

@havengit is it necessary to stop/start vm if you have restarted libvirtd and cloudstack-agent ? if so, it has big impact

havengit commented 2 months ago

@weizhouapache Restart libvirtd and cloudstack-agent has no effect ,VM must be rebooted or live migration to other host . I don't know if anyone else has encountered this, but this feature should have been added in 4.18, so maybe not many people are using it.

havengit commented 2 months ago

Using virt-viewer also fails to connect, so it shouldn't be an ACS issue.

weizhouapache commented 2 months ago

this may be related to #7015

can you check if

havengit commented 2 months ago

Yes , the vnc tls were enabled in all host. It seems that the qemu process that is running, does not recognize the certificate change and still uses the old certificate. Stop the startup or migrate and the new qemu process will use the new certificate. vnc_tls=1 vnc_tls_x509_verify=1 vnc_tls_x509_cert_dir="/etc/pki/libvirt-vnc"

weizhouapache commented 2 months ago

Yes , the vnc tls were enabled in all host. It seems that the qemu process that is running, does not recognize the certificate change and still uses the old certificate. Stop the startup or migrate and the new qemu process will use the new certificate. vnc_tls=1 vnc_tls_x509_verify=1 vnc_tls_x509_cert_dir="/etc/pki/libvirt-vnc"

thanks @havengit good to know that migration fixes the issue. stopping/starting all vms is not possible for large production environments.

To summarize,

cc @DaanHoogland @rohityadavcloud @nvazquez

havengit commented 2 months ago

Thanks ,weizhou and community , I have change Ca framework cert validity period to a very long time . I won't run into this problem in the future.

rohityadavcloud commented 1 week ago

Hi all, by default I think this should work. The ca.framework.cert.automatic.renewal needs to be enabled (true), and there's also ca.framework.cert.expiry.alert.period and ca.framework.background.task.delay. For agents that are expired certs but are connected it's not an issue, but such agents risk failing to join when restarted - for them an explicit API can be called:

(homecloud) 🐵 > provision certificate hostid= -h
provisionCertificate: Issues and propagates client certificate on a connected host/agent using configured CA plugin
This API is asynchronous.
Required params: hostid,
API Params               Type     Description
==========               ====     ===========
hostid                   uuid     The host/agent uuid to which the certific
                                  ate has to be provisioned (issued and pr
                                  opagated)
provider                 string   Name of the CA service provider, otherwis
                                  e the default configured provider plugin
                                   will be used
reconnect                boolean  Whether to attempt reconnection with host
                                  /agent after successful deployment of ce
                                  rtificate. When option is not provided,
                                  configured global setting is used

However, VNC console to users browser uses admin uploaded certificate - when they expire, admin needs to upload new end-user TLS/SSL certs. Lastly, I don't remember if we'd restart libvirtd on automatic cert renewal - the same API (or UI) button can be used to restart agent+libvirt I think - worth testing.