Closed ravening closed 3 years ago
$ curl https://10.11.118.126:6443/version
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
$ curl -k https://10.11.118.126:6443/version
{
"major": "1",
"minor": "16",
"gitVersion": "v1.16.10",
"gitCommit": "f3add640dbcd4f3c33a7749f38baaac0b3fe810d",
"gitTreeState": "clean",
"buildDate": "2020-05-20T13:51:56Z",
"goVersion": "go1.13.9",
"compiler": "gc",
"platform": "linux/amd64"
@ravening can you check cluster deployment using a prebuilt community Kubernetes version ISO, https://download.cloudstack.org/cks/ while creating the ISO you are using k8s version as 1.16.3 but for weavenet you are passing 1.12.5. Not sure if that is the issue. Also, check if MS is able to SSH your cluster VMs. CKS will need SSH access for communicating with k8s cluster master VM.
@shwstppr the MS cant ssh to master vm in my setup. The MS needs ssh access to all the k8s cluster master vm? which version of weavenet should I pass for 1.16.3?
@ravening MS will need SSH access on master node VMs while starting cluster and scaling it. When upgrading cluster it will need SSH access on all VMs. This must be the problem. Deployment would have failed after the time set in cloud.kubernetes.cluster.start.timeout
global setting (Default 3600secs or 1hr)
It needs to connect to the k8s deployment using kubectl on master vm
Yes, ideally the same version which you have used for k8s. I've not played with different version myself
weavenet
@shwstppr @ravening this issue seems to be caused by certificate. cloudstack generates self-signed certificate for k8s cluster and verify api server (port 6443) via https (see my comment above). I have tried the following steps which work (1) stop k8s cluster (2) download k8s config on UI (3) use base64 to decode the certificate-authority-data in config (4) save the decoded string in /etc/ssl/certs/k8s.cert (5) start the k8s cluster, it is Running
@weizhouapache but customers cant do all those steps right? Everytime we create a cluster then we need to do all these extra steps which is cumbersome and also customers wont like to do all these things
Hi, any update to fix this ? I have the same problem. I tested all prebuilt community Kubernetes ISO version and no success. And impossible to download k8s config when the cluster is not deployed. I have the error "Setup is in progress for Kubernetes cluster ID".
@stack1313 where did your coreos template come from?
@shwstppr can you triage this? Thanks.
Hi,
I think I found the problem on my installation. The issue comes from the networking offering/existing isolated network.
On cluster creation wizard, first time I decided to deploy on an existing isolated network, with this configuration I have the issue. To fix this I created a new network offering with a new isolated network. and after that the deployment work without issue. The problem not happen when you have network empty on wizard because the wizard deploy new isolated network with the default kubernetes network offering.
Maybe the problem comes from the network offering options (like default allow/deny traffic, dhcp, dns.....)
Thanks,
This sounds like it is more of a documentation bug then.
@PaulAngus I think the original issue reported here is different from what @stack1313 is reporting. Maybe @ravening can confirm
I will test what stack1313 mentioned and update the result.
Same issue on the error during cluster creation. To complete the cluster creation
New here so are there any fix for current setup at the moment ?
Management logs return the following error and tentatively I could write a routine script to parse for the following and issue a curl - k to install the cert so cluster creation can be completed.
Server returned HTTP response code: 403 for URL: https://ip_of_the_node:6443/version
@ravening MS will need SSH access on master node VMs while starting cluster and scaling it. When upgrading cluster it will need SSH access on all VMs. This must be the problem. Deployment would have failed after the time set in
cloud.kubernetes.cluster.start.timeout
global setting (Default 3600secs or 1hr) It needs to connect to the k8s deployment using kubectl on master vmYes, ideally the same version which you have used for k8s. I've not played with different version myself
@shwstppr usually in our production environments, mgt server doesnt have ssh access to any of the VM's. So how do I fix the issue in that case?
For the current model of implementation we expect mgmt server to be able to ssh/have access to the cks/k8s nodes; the feature cannot be supported with an agent or other approches which may not require mgmt server to ssh to the nodes. I'm removing the 4.15.0.0 milestone as such an approach may require a lot of rework @ravening - also any design docs on a new approach as well as PRs are welcome.
@shwstppr I think this is now reduced to a documentation issue. Can you check that and the docs, please?
@shwstppr I think this is now reduced to a documentation issue. Can you check that and the docs, please?
@DaanHoogland where is the doc for this issue ?
@weizhouapache I see some documentation here, but I'm not sure if it cover all you're looking for http://docs.cloudstack.apache.org/en/latest/plugins/cloudstack-kubernetes-service.html#kubernetes-clusters (this certainly states the type of networks CKS is supported on)
@rhtyd thanks. I tested cloudstack 4.15.0.0 RC3 and this issue still exist in my testing. I followed up the steps on the page you posted.
I tested with official ISO, official coreos template, default network offering for kubernetes cluster. There is error caused by self-signed certificate. see line https://github.com/apache/cloudstack/blob/master/plugins/integrations/kubernetes-service/src/main/java/com/cloud/kubernetes/cluster/utils/KubernetesClusterUtil.java#L223
2020-12-28 20:49:45,589 WARN [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-11:ctx-81fccb14 job-107 ctx-43764632) (logid:9b4ccbea) API endpoint for Kubernetes cluster : test2 not available
java.io.IOException: Server returned HTTP response code: 403 for URL: https://10.135.122.164:6443/version
this seems to be fixed by "curl -k https://10.135.122.164:6443/version" on mgt server.
@weizhouapache We tried to reproduces this in our test env(with and without secured management servers) but can't. Can it be due to the way we provision certificates? I'm not sure how to incorporate using the curl -k
call in the code. We set certificates for the k8s cluster here, https://github.com/apache/cloudstack/blob/master/plugins/integrations/kubernetes-service/src/main/java/com/cloud/kubernetes/cluster/actionworkers/KubernetesClusterStartWorker.java#L146-L155
Do you see any changes there? cc @ravening @rhtyd @Pearl1594
Also, when you get this error can you check if kube-apiserver-k8s-master
pod shows running at some point in the cluster with kubectl kubectl get pods --all-namespaces
@weizhouapache do you suspect the code is not ignoring SSL errors/warnings that may be causing that https://github.com/apache/cloudstack/blob/master/plugins/integrations/kubernetes-service/src/main/java/com/cloud/kubernetes/cluster/utils/KubernetesClusterUtil.java#L223 ? However a 403/forbidden error hints that this is not SSL error. Can you investigate further?
@weizhouapache I tested and found that the isolated/network should allow egress (public) internet to work with ISOs from http://download.cloudstack.org/cks - check and see if you're hitting the same. @davidjumani has fixed the issue here https://github.com/apache/cloudstack/pull/4459 but we haven't pushed newer ISOs (which we'll try to update soon).
Commentary/notes: I followed the docs (http://docs.cloudstack.apache.org/en/latest/plugins/cloudstack-kubernetes-service.html) and enabled global settings and set up the CoreOS template, then created a CKS cluster with k8s v1.16.0 (1 worker node + 1 master node with 2GB ram 2vCPUs) on a KVM advanced zone env with shared storage on a pre-created isolated network, I saw the following when you deploy the cluster:
2020-12-31 07:56:45,638 WARN [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-2:ctx-7051112a job-3394 ctx-c601630a) (logid:581f7bce) API endpoint for Kubernetes cluster : cks1-ry not available
javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1588)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1416)
After the nodes are up and kubeadm is able to initialise them, I see this in logs:
2020-12-31 07:57:15,701 INFO [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-2:ctx-7051112a job-3394 ctx-c601630a) (logid:581f7bce) Kubernetes cluster : cks1-ry API has been successfully provisioned, {
"major": "1",
"minor": "16",
"gitVersion": "v1.16.0",
"gitCommit": "2bd9643cee5b3b3a5ecbd3af49d09018f0773c77",
"gitTreeState": "clean",
"buildDate": "2019-09-18T14:27:17Z",
"goVersion": "go1.12.9",
"compiler": "gc",
"platform": "linux/amd64"
}
And after some time I see:
2020-12-31 07:59:50,170 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-2:ctx-7051112a job-3394 ctx-c601630a) (logid:581f7bce) Checking ready nodes for the Kubernetes cluster : cks1-ry with total 2 provisioned nodes
2020-12-31 07:59:50,543 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-2:ctx-7051112a job-3394 ctx-c601630a) (logid:581f7bce) Kubernetes cluster : cks1-ry has total 2 provisioned nodes while 0 ready now
This continued for a while, then I debugged to find that the nodes were unable to fetch container images, I saw:
$ sudo ./kubectl get nodes
NAME STATUS ROLES AGE VERSION
cks1-ry-master NotReady master 12m v1.16.0
cks1-ry-node-1 NotReady <none> 12m v1.16.0
$ sudo ./kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5644d7b6d9-7xcxj 0/1 Pending 0 13m <none> <none> <none> <none>
coredns-5644d7b6d9-b8twq 0/1 Pending 0 13m <none> <none> <none> <none>
etcd-cks1-ry-master 1/1 Running 0 12m 10.1.1.150 cks1-ry-master <none> <none>
kube-apiserver-cks1-ry-master 1/1 Running 0 12m 10.1.1.150 cks1-ry-master <none> <none>
kube-controller-manager-cks1-ry-master 1/1 Running 0 12m 10.1.1.150 cks1-ry-master <none> <none>
kube-proxy-bwmhj 1/1 Running 0 13m 10.1.1.35 cks1-ry-node-1 <none> <none>
kube-proxy-tkjbp 1/1 Running 0 13m 10.1.1.150 cks1-ry-master <none> <none>
kube-scheduler-cks1-ry-master 1/1 Running 0 12m 10.1.1.150 cks1-ry-master <none> <none>
weave-net-g6d9l 0/2 ImagePullBackOff 0 13m 10.1.1.150 cks1-ry-master <none> <none>
weave-net-q4ft5 0/2 ErrImagePull 0 13m 10.1.1.35 cks1-ry-node-1 <none> <none>
I checked and added egress allow rules and then manually pulled images which fixed that issue:
docker pull docker.io/weaveworks/weave-kube:2.7.0
docker pull docker.io/weaveworks/weave-npc:2.7.0
After this the cluster came up and I was able to do basic tests using kubectl and use k8s dashboard via proxy.
@ravening @weizhouapache cc @shwstppr I've added a line here - https://github.com/apache/cloudstack-documentation/pull/174/files (with newer CKS ISOs that we'll build, we'll bundle additional dependencies (weavenet etc) so they're not fetching during setup)
@rhtyd thanks a lot Rohit. I will test it again.
Cert check/SSL issue fixed in https://github.com/apache/cloudstack/pull/4639 Please re-open if we something else was missed (docs?)
Hello, don't know if my error to relevant to this issue or not.
but I'm using fresh ACS 4.16.1.0 Management and KVM nodes Ubuntu 20.04
when trying to create a k8s cluster
master node and compute node running but cluster still starting forever
2022-06-05 04:59:55,946 WARN [o.a.c.f.j.i.AsyncJobMonitor] (Timer-0:ctx-8669d7f6) (logid:933baacc) Task (job-326) has been pending for 1083 seconds 2022-06-05 04:59:59,472 WARN [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-7:ctx-c7dbf788 job-326 ctx-2117db03) (logid:ee321178) API endpoint for Kubernetes cluster : k8s011 not available javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake Caused by: java.io.EOFException: SSL peer shut down incorrectly
and this is the result from ACS management when trying to k8s master node curl -k https://172.21.21.108:6443/version
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 172.21.21.108:6443
Worked here :
Need to change endpoint.url
global variable from http://localhost:8080/client/api
to http://MANAGEMENT_SERVER_IP:8080/client/api
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
Advanced zone
OS / ENVIRONMENT
Ubuntu 18.04 OS on KVM hypervisor
SUMMARY
Unable to create new Kubernetes cluster
I was able to register CoreOS, I was able to add new supported kubernetes version but unable to create a cluster as its throwing exception
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS