Open 0sorkon opened 3 months ago
hmm, I had the same issue https://github.com/apache/cloudstack/pull/8530#issuecomment-2164723505 @vishesh92 is looking into it
@weizhouapache , the only known workaround is wait for 3600 seconds? Or is there something else we can do?
@weizhouapache , the only known workaround is wait for 3600 seconds? Or is there something else we can do?
restarting cloudstack-agent did not help (apparently). I remember I have restarted cloudstack-management service which did not help either
maybe restarting mysql/mariadb is an option... (faster than waiting 1 hour)
We had a similar issue in our one of our smoke tests. I created this PR https://github.com/apache/cloudstack/pull/8502 to fix it.
@weizhouapache do you think it could be the same issue?
We had a similar issue in our one of our smoke tests. I created this PR https://github.com/apache/cloudstack/pull/8502 to fix it.
@weizhouapache do you think it could be the same issue?
Could be same or similar My env has included the fix. I will check the logs
I may have found the source of the problem. My installation uses linstror as the primary storage. And communication with linstro-controller is configured via ssl. A certificate is generated on the controller:
keytool -keyalg rsa -keysize 2048 -genkey -validity 9999 -keystore /var/lib/linstor/cert/keystore_linstor.jks -alias linstor_controller -dname "CN=storage-controller.local, OU=SecureUnit, O=Mycompany, L=Mars, ST=Stratos, C=IT"
this certificate is written to the config:
keystore = "/var/lib/linstor/cert/keystore_linstor.jks"
on all cloudstack-agents and MS the certificate is imported into /etc/ssl/certs/java/cacerts:
echo -n | openssl s_client -connect LINSTOR-CONTR_IP:3371 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/linstor.crt
keytool -import -trustcacerts -keystore /etc/ssl/certs/java/cacerts -storepass changeit -noprompt -alias linstor_controller -file /tmp/linstor.crt
Everything worked fine in version 4.18.1.0. And even works now at the moment when MS version 4.18.2 and other nodes (except for the problematic one) - 4.18.1.0. However, on the cloudstack-agent that I upgraded to 4.18.2 the logs show the following:
2024-06-14 09:11:04,906 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f (Filesystem) in libvirt 2024-06-14 09:11:04,926 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f, using it. 2024-06-14 09:11:04,926 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f from libvirt 2024-06-14 09:11:04,967 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Process agent startup answer, agent id = 0 2024-06-14 09:11:04,967 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Set agent id 0 2024-06-14 09:11:04,968 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0 2024-06-14 09:11:05,012 WARN [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:3c0c1f6f) Caught: javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:284) at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278) at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:229) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414) at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:752) at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:419) at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:319) at com.linbit.linstor.api.ApiClient.invokeAPI(ApiClient.java:703) at com.linbit.linstor.api.DevelopersApi.resourceGroupList(DevelopersApi.java:2740) at org.apache.cloudstack.storage.datastore.util.LinstorUtil.getCapacityBytes(LinstorUtil.java:64) at com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor.getCapacity(LinstorStorageAdaptor.java:509) at com.cloud.hypervisor.kvm.storage.LinstorStoragePool.getCapacity(LinstorStoragePool.java:96) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtModifyStoragePoolCommandWrapper.execute(LibvirtModifyStoragePoolCommandWrapper.java:49) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtModifyStoragePoolCommandWrapper.execute(LibvirtModifyStoragePoolCommandWrapper.java:35) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1869) at com.cloud.agent.Agent.processRequest(Agent.java:663) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1086) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:360) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:298) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427) at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:580) at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1614) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1542) at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334) at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:390) at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:282) ... 26 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439) at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306) at java.base/sun.security.validator.Validator.validate(Validator.java:264) at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:222) at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1341) ... 45 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:148) at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:129) at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297) at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434) ... 51 more
and the node is still stuck in a "Сonnection" state. I tried regenerating the certificate but the problem persists. Maybe something has changed in JAVA and the certificates need to be added to a different certstore?
@0sorkon , can you verify if this is still an issue in 4.19? It seems @vishesh92 refers a fix that is on 4.19 and up.
@0sorkon - Also check below PR as after upgrade unable to add new host could be due to this as well. https://github.com/apache/cloudstack/pull/8641
@0sorkon , can you verify if this is still an issue in 4.19? It seems @vishesh92 refers a fix that is on 4.19 and up.
I can't confirm because for the update I had to edit the line of connection to the controller in the database. now it doesn't use ssl and certificate. Only after that I managed to update to 4.18.2 and 4.19.1.0.
@0sorkon - Also check below PR as after upgrade unable to add new host could be due to this as well. #8641
I don't think this is my scenario as all servers use ubuntu.
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
Hello, I need help to resolve this. After updating 4.18.1 -> 4.18.2, the first updated host is stuck in the "Connecting" state. The MS update was successful, but the problem occurred with the first host I tried to update immediately after MS. In the MS logs:
In the agent log: