Closed DaanHoogland closed 1 day ago
@DaanHoogland can you share the hypervisor type and version ? cks iso link ?
cks tried by me is 1.27.8, but user reported trying several versions. Host os is vmware, and I will verify others and update the description. I am first checking 4.18.1 (and possibly before) to see when it was introduced.
and network type, etc
the public ip of the CKS cluster should be accessible from cloudstack mgmt server in some setup, if the mgmt server (in private network) cannot access the cks nodes (via public IP) and get the status of cks cluster, the cluster might end in Error state
@weizhouapache In the tests I did recreating the problem, the connectivity between the public ip of CKS cluster and the managers is enabled and the problem occurs anyway. Also, if one of the nodes is accessed via console and the ssh service is started manually, from the managers I can establish the ssh connection.
the problem occurs anyway. Also, if one of the nodes is accessed via console and the ssh service is started manually, from the managers I can establish the ssh connection.
ok @luganofer did it happen every time ? or just once ?
@weizhouapache At least in my lab environment it happens in every deployment I try and with several different k8s versions (1.28.4, 1.27.8, 1.27.3, 1.26.6)
@weizhouapache At least in my lab environment it happens in every deployment I try and with several different k8s versions (1.28.4, 1.27.8, 1.27.3, 1.26.6)
@luganofer can you also the hypervisor type and version, the link of cks iso ?
@weizhouapache I am using VMware vSphere 8.0c and all the ISOs were downloaded from the following link: https://download.cloudstack.org/cks/
@luganofer As the nodes / VMs come up, do you see any error logs in the VM console?
Hi @Pearl1594, no error logs en console VM.
Only the following error is observed in managers logs:
ERROR [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-84:ctx-216a9879 job-150679 ctx-7f68a95d) (logid:e493dc8f) Failed to setup Kubernetes cluster : maradona in usable state as unable to access control node VMs of the cluster
From my perspective, the problem is related to nodes that do not initialise correctly (cloud-init ?). They receive ip by dhcp from the VR, but do not change the hostname and fundamentally do not start the ssh service so the deployed nodes cannot be reached by the acs managers (via ssh) and the correct deployment of the k8s cluster is not completed.
Hi @Pearl1594, no error logs en console VM.
Only the following error is observed in managers logs:ERROR [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-84:ctx-216a9879 job-150679 ctx-7f68a95d) (logid:e493dc8f) Failed to setup Kubernetes cluster : maradona in usable state as unable to access control node VMs of the cluster
From my perspective, the problem is related to nodes that do not initialise correctly (cloud-init ?). They receive ip by dhcp from the VR, but do not change the hostname and fundamentally do not start the ssh service so the deployed nodes cannot be reached by the acs managers (via ssh) and the correct deployment of the k8s cluster is not completed.
@luganofer If you are able to log into the vm (is the password still "password"?) and restart ssh, can you check the cloud-init logs? /var/log/cloud-init-*
Can you also double check the vmware version? 8.0c, 8.0 update 1c or 8.0 update 2c?
cks tried by me is 1.27.8, but user reported trying several versions. Host os is vmware, and I will verify others and update the description. I am first checking 4.18.1 (and possibly before) to see when it was introduced.
Sorry, I forgot to feedback; xcpng and kvm seem to work, just vmware is broken.
cks tried by me is 1.27.8, but user reported trying several versions. Host os is vmware, and I will verify others and update the description. I am first checking 4.18.1 (and possibly before) to see when it was introduced.
Sorry, I forgot to feedback; xcpng and kvm seem to work, just vmware is broken.
which vmware version did you test? @DaanHoogland It seems to be working in Trillian tests
What test is verifying this @weizhouapache ? (as I recall it was 70u3, but I'll check)
it was 80u1 , @weizhouapache
it was 80u1 , @weizhouapache
80u1 (8.0.1.0) is not working. See #7572. Do not run 4.18/4.19 test with it. However, 4.20 seems to be working with 80u1.
we use 8.0b (8.0.0.2) in Trillian tests with vmware-80. It has been run many times. The test results look good.
the reporter uses 8.0c (8.0.0.3, if the version is correct). Maybe we can upgrade trillian vm template from 8.0b to 8.0c and run some tests @DaanHoogland
@DaanHoogland there is a known issue that systemvm/cks node is stuck at Starting on vmware 80u1 https://github.com/apache/cloudstack/issues/7572 @DaanHoogland will you move this to 4.20.0.0 milestone and test it later ?
@sureshanaparti is working on vmware 80u1/u2/u3 support in 4.20.0.0
if this issue happens with vmware 8.0u1/u2/u3, it should have been addressed by #9625
cc @DaanHoogland @rohityadavcloud @sureshanaparti @JoaoJandre
if this issue happens with vmware 8.0u1/u2/u3, it should have been addressed by #9625
cc @DaanHoogland @rohityadavcloud @sureshanaparti @JoaoJandre
@weizhouapache I do not have a VMware 8 env to test this.
Could someone validate if the issue persists after #9625? cc @DaanHoogland @rohityadavcloud @sureshanaparti
Tested on both 8.0u2 and 8.0u3 both clusters are marked as running, so I think it safe to assume this is solved.
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
simple installation with CKS enabled
OS / ENVIRONMENT
4.19 with any hypervisor/network model
SUMMARY
when starting a CKS cluster the control node does not enable ssh and thus the cluster never comes up.
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS