apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2k stars 1.09k forks source link

Host is stuck in Connecting state after update to 4.18.2.0 #9250

Open 0sorkon opened 3 months ago

0sorkon commented 3 months ago
ISSUE TYPE
COMPONENT NAME
management server
CLOUDSTACK VERSION
4.18.2

Hello, I need help to resolve this. After updating 4.18.1 -> 4.18.2, the first updated host is stuck in the "Connecting" state. The MS update was successful, but the problem occurred with the first host I tried to update immediately after MS. In the MS logs:

2024-06-13 16:12:30,872 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-12:null) (logid:) SeqA 60-328142: Sending Seq 60-328142: { Ans: , MgmtId: 20788631421694, via: 60, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":"true","wait":"0","bypassHostMaintenance":"false"}}] } 2024-06-13 16:12:31,437 INFO [c.c.u.e.CSExceptionErrorCode] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Could not find exception: com.cloud.exception.ConnectionException in error code list for exceptions 2024-06-13 16:12:31,437 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Failed to handle host connection: com.cloud.exception.ConnectionException: Unable to acquire lock on host 53f6efbf-5180-4475-8f24-b960a3e08db1 at com.cloud.agent.manager.AgentManagerImpl.sendReadyAndGetAttache(AgentManagerImpl.java:1153) at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:1166) at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1250) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2024-06-13 16:12:31,438 WARN [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Unable to create attache for agent: Seq 0-0: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"1","cpus":"48","speed":"2794","memory":"268619120640","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.15.0-105-generic","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"22.04","host.volume.encryption":"true","secured":"true","Host.OS":"Ubuntu"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"7f57af20-c5cf-3ec7-95bc-7bf6463cfa84-LibvirtComputingResource","name":"cloud.nl.introserv.net","id":"0","version":"4.18.2.0","iqn":"iqn.2004-10.com.ubuntu:01:e7f6625b59c","publicIpAddress":"185.191.1.138","publicNetmask":"255.255.255.248","publicMacAddress":"e2:c5:c6:ce:3b:69","privateIpAddress":"172.16.0.1","privateMacAddress":"b2:38:22:80:59:7d","privateNetmask":"255.255.255.0","storageIpAddress":"172.16.0.1","storageNetmask":"255.255.255.0","storageMacAddress":"b2:38:22:80:59:7d","resourceName":"LibvirtComputingResource","gatewayIpAddress":"185.191.1.137","msHostList":"178.132.3.175@static","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"8a76c6b2-f791-4f6b-a09e-85a581b8189f","host":"172.16.0.1","localPath":"/Storage/images","hostPath":"/Storage/images","poolType":"Filesystem","capacityBytes":"(97.87 GB) 105089261568","availableBytes":"(74.69 GB) 80196403200"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"7f57af20-c5cf-3ec7-95bc-7bf6463cfa84-LibvirtComputingResource","name":"cloud.nl.introserv.net","id":"0","version":"4.18.2.0","resourceName":"LibvirtComputingResource","msHostList":"178.132.3.175@static","wait":"0","bypassHostMaintenance":"false"}}] } 2024-06-13 16:12:31,446 WARN [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-5:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0--1: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","removeHost":"false","wait":"0","bypassHostMaintenance":"false"}}] } 2024-06-13 16:12:31,539 DEBUG [c.c.s.StatsCollector] (StatsCollector-5:ctx-7eb2d626) (logid:ac80bd51) ManagementServerCollector is running... 2024-06-13 16:12:31,540 WARN [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-13:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0-3: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":"true","_vnetAccessible":"true","hostType":"Routing","hostId":"0","wait":"0","bypassHostMaintenance":"false"}}] }

In the agent log:

2024-06-13 16:17:09,751 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0 2024-06-13 16:19:09,746 INFO [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill 2024-06-13 16:19:21,239 INFO [cloud.agent.AgentShell] (main:null) (logid:) Agent started 2024-06-13 16:19:21,243 INFO [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.18.2.0 2024-06-13 16:19:21,244 INFO [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties 2024-06-13 16:19:21,283 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage 2024-06-13 16:19:21,284 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm 2024-06-13 16:19:21,321 INFO [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml 2024-06-13 16:19:21,321 INFO [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection 2024-06-13 16:19:21,387 INFO [cloud.agent.Agent] (main:null) (logid:) id is 0 2024-06-13 16:19:21,396 ERROR [kvm.resource.LibvirtComputingResource] (main:null) (logid:) uefi properties file not found due to: Unable to find file uefi.properties. 2024-06-13 16:19:21,435 INFO [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one 2024-06-13 16:19:21,652 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) IO uring driver for Qemu: enabled 2024-06-13 16:19:21,757 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor 2024-06-13 16:19:21,760 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.StorPoolStorageAdaptor 2024-06-13 16:19:21,761 WARN [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) Duplicate StorageAdaptor type PowerFlex, not loading com.cloud.hypervisor.kvm.storage.ScaleIOStorageAdaptor 2024-06-13 16:19:21,761 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.IscsiAdmStorageAdaptor 2024-06-13 16:19:21,762 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) No libvirt.vif.driver specified. Defaults to BridgeVifDriver. 2024-06-13 16:19:21,888 INFO [cloud.serializer.GsonHelper] (main:null) (logid:) Default Builder inited. 2024-06-13 16:19:21,888 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) iscsi session clean up is disabled 2024-06-13 16:19:21,890 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Skipping the memory balloon stats period setting, since there are no VMs (active Libvirt domains) on this host. 2024-06-13 16:19:21,891 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) The [vm.memballoon.stats.period] property is set to '0', this prevents memory statistics from being displayed correctly. Adjust (increase) the value of this parameter to correct this. 2024-06-13 16:19:21,899 INFO [cloud.agent.Agent] (main:null) (logid:) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 178.132.3.175 : port = 8250 2024-06-13 16:19:21,905 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 178.132.3.175:8250 2024-06-13 16:19:21,907 INFO [utils.nio.Link] (main:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties 2024-06-13 16:19:22,163 INFO [utils.nio.NioClient] (main:null) (logid:) SSL: Handshake done 2024-06-13 16:19:22,163 INFO [utils.nio.NioClient] (main:null) (logid:) Connected to 178.132.3.175:8250 2024-06-13 16:19:22,236 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from command "lscpu". 2024-06-13 16:19:22,279 ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Unable to retrieve the CPU speed from lscpu. java.lang.NullPointerException at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838) at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122) at java.base/java.lang.Float.parseFloat(Float.java:455) at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:124) at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:99) at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:179) at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:65) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3522) at com.cloud.agent.Agent.sendStartup(Agent.java:444) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1105) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2024-06-13 16:19:22,281 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]. 2024-06-13 16:19:22,281 ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Unable to retrieve the CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency] java.io.FileNotFoundException: /sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:219) at java.base/java.io.FileInputStream.(FileInputStream.java:157) at java.base/java.io.FileInputStream.(FileInputStream.java:112) at java.base/java.io.FileReader.(FileReader.java:60) at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:135) at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:104) at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:179) at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:65) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3522) at com.cloud.agent.Agent.sendStartup(Agent.java:444) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1105) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2024-06-13 16:19:22,281 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from "host capabilities" 2024-06-13 16:19:22,293 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Retrieved value [2794749000] from "host capabilities". This corresponds to a CPU speed of [2794] MHz. 2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Host uses control group [cgroup2fs]. 2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Calculating the max shares of the host. 2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) The max shares of the host is [134112]. 2024-06-13 16:19:22,344 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion 2024-06-13 16:19:22,360 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f (Filesystem) in libvirt 2024-06-13 16:19:22,382 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f, using it. 2024-06-13 16:19:22,383 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f from libvirt 2024-06-13 16:19:22,473 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Process agent startup answer, agent id = 0 2024-06-13 16:19:22,473 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Set agent id 0 2024-06-13 16:19:22,486 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0 2024-06-13 16:21:22,481 INFO [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill

weizhouapache commented 3 months ago

hmm, I had the same issue https://github.com/apache/cloudstack/pull/8530#issuecomment-2164723505 @vishesh92 is looking into it

DaanHoogland commented 3 months ago

@weizhouapache , the only known workaround is wait for 3600 seconds? Or is there something else we can do?

weizhouapache commented 3 months ago

@weizhouapache , the only known workaround is wait for 3600 seconds? Or is there something else we can do?

restarting cloudstack-agent did not help (apparently). I remember I have restarted cloudstack-management service which did not help either

maybe restarting mysql/mariadb is an option... (faster than waiting 1 hour)

vishesh92 commented 3 months ago

We had a similar issue in our one of our smoke tests. I created this PR https://github.com/apache/cloudstack/pull/8502 to fix it.

@weizhouapache do you think it could be the same issue?

weizhouapache commented 3 months ago

We had a similar issue in our one of our smoke tests. I created this PR https://github.com/apache/cloudstack/pull/8502 to fix it.

@weizhouapache do you think it could be the same issue?

Could be same or similar My env has included the fix. I will check the logs

0sorkon commented 3 months ago

I may have found the source of the problem. My installation uses linstror as the primary storage. And communication with linstro-controller is configured via ssl. A certificate is generated on the controller:

keytool -keyalg rsa -keysize 2048 -genkey -validity 9999 -keystore /var/lib/linstor/cert/keystore_linstor.jks -alias linstor_controller -dname "CN=storage-controller.local, OU=SecureUnit, O=Mycompany, L=Mars, ST=Stratos, C=IT"

this certificate is written to the config:

keystore = "/var/lib/linstor/cert/keystore_linstor.jks"

on all cloudstack-agents and MS the certificate is imported into /etc/ssl/certs/java/cacerts:

echo -n | openssl s_client -connect LINSTOR-CONTR_IP:3371 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/linstor.crt

keytool -import -trustcacerts -keystore /etc/ssl/certs/java/cacerts -storepass changeit -noprompt -alias linstor_controller -file /tmp/linstor.crt

Everything worked fine in version 4.18.1.0. And even works now at the moment when MS version 4.18.2 and other nodes (except for the problematic one) - 4.18.1.0. However, on the cloudstack-agent that I upgraded to 4.18.2 the logs show the following:

2024-06-14 09:11:04,906 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f (Filesystem) in libvirt 2024-06-14 09:11:04,926 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f, using it. 2024-06-14 09:11:04,926 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f from libvirt 2024-06-14 09:11:04,967 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Process agent startup answer, agent id = 0 2024-06-14 09:11:04,967 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Set agent id 0 2024-06-14 09:11:04,968 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0 2024-06-14 09:11:05,012 WARN [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:3c0c1f6f) Caught: javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:284) at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278) at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:229) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414) at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:752) at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:419) at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:319) at com.linbit.linstor.api.ApiClient.invokeAPI(ApiClient.java:703) at com.linbit.linstor.api.DevelopersApi.resourceGroupList(DevelopersApi.java:2740) at org.apache.cloudstack.storage.datastore.util.LinstorUtil.getCapacityBytes(LinstorUtil.java:64) at com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor.getCapacity(LinstorStorageAdaptor.java:509) at com.cloud.hypervisor.kvm.storage.LinstorStoragePool.getCapacity(LinstorStoragePool.java:96) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtModifyStoragePoolCommandWrapper.execute(LibvirtModifyStoragePoolCommandWrapper.java:49) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtModifyStoragePoolCommandWrapper.execute(LibvirtModifyStoragePoolCommandWrapper.java:35) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1869) at com.cloud.agent.Agent.processRequest(Agent.java:663) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1086) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:360) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:303) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:298) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511) at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427) at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:580) at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1614) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1542) at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334) at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:390) at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:282) ... 26 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439) at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306) at java.base/sun.security.validator.Validator.validate(Validator.java:264) at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:222) at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129) at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1341) ... 45 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:148) at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:129) at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297) at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434) ... 51 more

and the node is still stuck in a "Сonnection" state. I tried regenerating the certificate but the problem persists. Maybe something has changed in JAVA and the certificates need to be added to a different certstore?

DaanHoogland commented 3 months ago

@0sorkon , can you verify if this is still an issue in 4.19? It seems @vishesh92 refers a fix that is on 4.19 and up.

yashi4engg commented 3 months ago

@0sorkon - Also check below PR as after upgrade unable to add new host could be due to this as well. https://github.com/apache/cloudstack/pull/8641

0sorkon commented 2 months ago

@0sorkon , can you verify if this is still an issue in 4.19? It seems @vishesh92 refers a fix that is on 4.19 and up.

I can't confirm because for the update I had to edit the line of connection to the controller in the database. now it doesn't use ssl and certificate. Only after that I managed to update to 4.18.2 and 4.19.1.0.

@0sorkon - Also check below PR as after upgrade unable to add new host could be due to this as well. #8641

I don't think this is my scenario as all servers use ubuntu.