Open rubensdevito opened 6 years ago
@rubensdevito
I guess your K8S cluster running inside VPC and K8S api server gets resolved with an external IP address ( see Jeff Barr article here ). To check this, go to Route 53, check if your cluster looks like this:
api.k8s.example.com A 52.xxx.xxx.xxx
api.internal.k8s.example.com A 172.20.xxx.xxxx
etcd-a.internal.k8s.example.com A 172.20.xxx.xxxx
etcd-events-a.internal.k8s.example.com A 172.20.xxx.xxxx
To workaround this issue, you need to fix it on kubeconfig template (kube-manifests/config) with:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: $CA
server: https://api.internal.$ENDPOINT
name: $ENDPOINT
contexts:
- context:
cluster: $ENDPOINT
user: $ENDPOINT
name: $ENDPOINT
current-context: $ENDPOINT
kind: Config
preferences: {}
users:
- name: $ENDPOINT
user:
client-certificate-data: $CLIENT_CERT
client-key-data: $CLIENT_KEY
If you work with original K8S generated kubeconfig with RBAC (just like me), "password" and "username" is needed after "client-key-data" field. Of course, lambda and K8S need to be on same subnet as well as lambda and master(s) need to on the same security group.
BTW, although I got it deployed successfully (the image id really changed in K8S side), it still timeout (not reach posting codepipeline succeeded).
I finally got it working. This thread helps (look at Posted by BEm on Jun 29, 2017 3:09 AM):
Can someone post complete instructions that work to fix this? I've added the lambda function to the VPC, assigned subnets to the lambda function, assigned security groups to the lambda function, and added the AWSLambdaVPCAccessExecutionRole policy to the roles created for the lambda function. Nothing helps, error message doesn't change, deployments fail.
Hi - the problem is that the way this lambda and cfn is structured is that you need to have a public dns record. It assumes that the cluster is accessible via a route 53 record. Here are some of my thoughts on updates/changes/options:
Change from Python client to Go client (just need to do this, Go is preferred language in the k8s community)
Make config more adaptable to accomodate various auth methods
Solve gossip/private endpoint problem
Thoughts? Also, would love some help from anyone!
@dustyketchum
I saw you missing:
Below is the actual architecture diagram, although we use Github not CodeCommit.
@omarlari
Actually I really don't know if EKS would change everything, and consequently CodeDeploy would have options to deploy to EKS. In that case, contributors might think about "why I need to work on something which will be soon updated?"
@minghsieh-prenetics thanks, our subnets already had internet access, this wasn't our issue. My earlier message assumed that AWS networking was set up 'properly' w/ NATs, internet access available in private VPCs, etc. though I didn't explicitly state all that.
I believe the first problem is the instructions assume you have created a publicly available kubernetes cluster or you're using ec2 classic without a vpc (or perhaps both) - in either case, that assumption should be explicitly documented. This cloudformation template won't work as is for anyone with a cluster in a private network in a vpc.
This was my first exposure to lambda which made troubleshooting more challenging. I believe the changes I needed to make were, in order:
The cloudformation template could be updated to handle items 2 and 3 without too much trouble (ask for the vpc, subnet(s), and security group as cloudformation parameters).
Thanks, Dusty
@minghsieh-prenetics Do you have a reference implementation for your lambda deploy into kubernetes?
@StevenACoffman Yes I do. But it's not much difference between this:
https://github.com/aws-samples/aws-kube-codesuite/blob/master/src/kube-lambda.py
Actually the essential part of this repo is just this kube-lambda.py file. Don't let other dependent files confuse you.
Ah thanks @minghsieh-prenetics ! I see that this lambda is also necessary: https://github.com/aws-samples/aws-kube-codesuite/blob/master/templates/ssm-inject.yaml
However, without the other cloudformation machinery, I'm not clear on how to get the eks client cert and client cert key. I can get the other two bits to set up parameter store trivially:
ENDPOINT=$(aws eks describe-cluster --region us-east-1 --name $CLUSTERNAME --query cluster.endpoint)
CA=$(aws eks describe-cluster --region us-east-1 --name $CLUSTERNAME --query cluster.certificateAuthority.data)
I launched my eks via the web console, so I am not sure how to get these other two pieces. Any help would be greatly appreciated!
@StevenACoffman
Ah, manually use eks authenticated kubectl to create a kubernetes service account and retrieve that cert and key, save those to parameter store like this example. Not as one-click, fully automated, but since it's only done once, then it could be ok. Thanks!
@StevenACoffman You saved my day. Thanks
Great! Check out these for more details on the two viable approaches:
When Lambda tries to deploy the changes it fails. Here's the CloudWatch Logs dump:
START RequestId: f5ff58dd-fc68-11e7-8aaf-910e87942b5f Version: $LATEST
XXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/k8s-c-repos-1bdxoih448581 d8d49eb0 codesuite-demo
2018-01-18 16:02:22,662 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3a7f0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
[WARNING] 2018-01-18T16:02:22.662Z f5ff58dd-fc68-11e7-8aaf-910e87942b5f Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3a7f0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
2018-01-18 16:02:22,663 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3afd0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
[WARNING] 2018-01-18T16:02:22.663Z f5ff58dd-fc68-11e7-8aaf-910e87942b5f Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3afd0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
2018-01-18 16:02:22,665 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3a7b8>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
[WARNING] 2018-01-18T16:02:22.665Z f5ff58dd-fc68-11e7-8aaf-910e87942b5f Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3a7b8>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo
HTTPSConnectionPool(host='XXXXXXXXXXXXXXXXXX.us-west-2.elb.amazonaws.com', port=443): Max retries exceeded with url: /apis/extensions/v1beta1/namespaces/default/deployments/codesuite-demo (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7567c3a518>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Here's some information about my k8s cluster: