Closed alex-dabija closed 1 year ago
I created a new AWSClusterRoleIdentity
named default2
in grizzly
which points to account 180547736195
(atm known as gauss workload cluster account)
roles are created as well so testing should be possible
I created a new AWSClusterRoleIdentity
named default2
in golem
which points to account 180547736195
(atm known as gauss workload cluster account)
I’ve created a cross account public cluster named bear
. These are the things that I’ve checked for on AWS (using the specified gauss
account) and the cluster itself.
Checklist on AWS
Checklist on Cluster
I tried to create a private cluster on golem
but the aws-network-topology-operator
is saying that AWS can't find an existing subnet when the transit gateway is attached:
1.6710229132236247e+09 INFO Reconciling {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "cluster": {"name":"alextest36","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest36", "reconcileID": "6686176c-118c-48b7-9ae2-19166a0f4e64"}
1.6710229133360448e+09 INFO transitgateway-registrar Got TransitGateway {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "cluster": {"name":"alextest36","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest36", "reconcileID": "6686176c-118c-48b7-9ae2-19166a0f4e64", "transitGatewayID": "tgw-034e681b2d0288423"}
1.6710229134554205e+09 ERROR transitgateway-registrar Failed to create transit gateway attachments {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "cluster": {"name":"alextest36","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest36", "reconcileID": "6686176c-118c-48b7-9ae2-19166a0f4e64", "transitGatewayID": "tgw-034e681b2d0288423", "vpcID": "vpc-0e9d440dc6831956a", "error": "operation error EC2: CreateTransitGatewayVpcAttachment, https response error StatusCode: 400, RequestID: 93f6dbfb-b2de-454c-9913-7c93aabeb13f, api error InvalidSubnetID.NotFound: The subnet ID 'subnet-0545a88f6cbcbef1d' does not exist"}
1.6710229134735894e+09 INFO Done reconciling {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "cluster": {"name":"alextest36","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest36", "reconcileID": "6686176c-118c-48b7-9ae2-19166a0f4e64"}
1.6710229134736302e+09 ERROR Reconciler error {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "cluster": {"name":"alextest36","namespace":"org-giantswarm"}, "namespace": "org-giantswarm", "name": "alextest36", "reconcileID": "6686176c-118c-48b7-9ae2-19166a0f4e64", "error": "operation error EC2: CreateTransitGatewayVpcAttachment, https response error StatusCode: 400, RequestID: 93f6dbfb-b2de-454c-9913-7c93aabeb13f, api error InvalidSubnetID.NotFound: The subnet ID 'subnet-0545a88f6cbcbef1d' does not exist"}
The cluster was created with the following config:
---
apiVersion: v1
data:
values: |
aws:
region: eu-west-2
awsClusterRole: default2
bastion:
enabled: false
proxy:
enabled: true
http_proxy: "http://internal-a1c90e5331e124481a14fb7ad80ae8eb-1778512673.eu-west-2.elb.amazonaws.com:4000"
https_proxy: "http://internal-a1c90e5331e124481a14fb7ad80ae8eb-1778512673.eu-west-2.elb.amazonaws.com:4000"
no_proxy: "test-domain.com"
clusterName: alextest36
controlPlane:
replicas: 1
machinePools:
- instanceType: m5.xlarge
maxSize: 10
minSize: 3
name: machine-pool0
rootVolumeSizeGB: 300
availabilityZones:
- eu-west-2a
- eu-west-2b
- eu-west-2c
network:
vpcCIDR: 10.20.0.0/16
topologyMode: GiantSwarmManaged
availabilityZoneUsageLimit: 3
vpcMode: private
apiMode: private
dnsMode: private
subnets:
- cidrBlock: 10.20.0.0/18
- cidrBlock: 10.20.64.0/18
- cidrBlock: 10.20.128.0/18
organization: giantswarm
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
giantswarm.io/cluster: alextest36
name: alextest36-userconfig
namespace: org-giantswarm
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
name: alextest36
namespace: org-giantswarm
spec:
catalog: cluster
config:
configMap:
name: ""
namespace: ""
secret:
name: ""
namespace: ""
kubeConfig:
context:
name: ""
inCluster: true
secret:
name: ""
namespace: ""
name: cluster-aws
namespace: org-giantswarm
userConfig:
configMap:
name: alextest36-userconfig
namespace: org-giantswarm
version: 0.20.2
---
apiVersion: v1
data:
values: |
clusterName: alextest36
organization: giantswarm
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
giantswarm.io/cluster: alextest36
name: alextest36-default-apps-userconfig
namespace: org-giantswarm
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
giantswarm.io/cluster: alextest36
giantswarm.io/managed-by: cluster
name: alextest36-default-apps
namespace: org-giantswarm
spec:
catalog: cluster
config:
configMap:
name: alextest36-cluster-values
namespace: org-giantswarm
secret:
name: ""
namespace: ""
kubeConfig:
context:
name: ""
inCluster: true
secret:
name: ""
namespace: ""
name: default-apps-aws
namespace: org-giantswarm
userConfig:
configMap:
name: alextest36-default-apps-userconfig
namespace: org-giantswarm
version: 0.12.3
The subnet subnet-0545a88f6cbcbef1d
exists:
All the operations in the operator are done from the perspective of the management cluster. There are only 2 places where the GetAWSRoleIdentity
function is called:
The transit gateway attachment needs to be executed from the workload cluster's perspective.
The transit gatway and the prefix list needs to be shared with the workload cluster account before the cluster is created. I manually attached the transit gateway and updated the route tables. The cluster was able to start:
#: kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
giantswarm chart-operator-79d7b7567b-4rxd2 1/1 Running 0 8m38s
kube-system aws-pod-identity-webhook-app-66fd9f9b55-6zwk8 1/1 Running 0 9m45s
kube-system aws-pod-identity-webhook-app-66fd9f9b55-jf4wv 1/1 Running 0 9m45s
kube-system aws-pod-identity-webhook-restarter-27850445-hvkd5 0/1 Completed 0 99s
kube-system capi-node-labeler-2sgf9 1/1 Running 0 16m
kube-system capi-node-labeler-452jv 1/1 Running 0 16m
kube-system capi-node-labeler-82f2n 1/1 Running 0 16m
kube-system capi-node-labeler-fcnvb 1/1 Running 0 16m
kube-system cert-exporter-daemonset-7cwnr 1/1 Running 0 12m
kube-system cert-exporter-daemonset-9mvt9 1/1 Running 0 12m
kube-system cert-exporter-daemonset-tgpq2 1/1 Running 0 12m
kube-system cert-exporter-daemonset-whnnb 1/1 Running 0 12m
kube-system cert-exporter-deployment-85c658c656-kvppj 1/1 Running 0 12m
kube-system cert-manager-cainjector-6dc9c79bfd-swh79 1/1 Running 0 10m
kube-system cert-manager-controller-7b7c4c77c4-5hqcp 1/1 Running 0 10m
kube-system cert-manager-webhook-6bf4c564bb-m62b4 1/1 Running 0 10m
kube-system cert-manager-webhook-6bf4c564bb-p6lwr 1/1 Running 0 10m
kube-system cilium-ctwdg 1/1 Running 0 14m
kube-system cilium-jfqsb 1/1 Running 0 14m
kube-system cilium-kl7l4 1/1 Running 0 14m
kube-system cilium-operator-58bcdb44cb-vg2mf 1/1 Running 0 14m
kube-system cilium-operator-58bcdb44cb-wgtlq 1/1 Running 0 14m
kube-system cilium-wkkfr 1/1 Running 0 14m
kube-system coredns-controlplane-564bffc48d-w4hwz 1/1 Running 0 12m
kube-system coredns-workers-8666c764cd-fvfvb 1/1 Running 0 12m
kube-system coredns-workers-8666c764cd-p8r6k 1/1 Running 0 12m
kube-system ebs-csi-controller-67c4cf5496-q5txj 5/5 Running 0 11m
kube-system ebs-csi-node-9l6vr 3/3 Running 0 11m
kube-system ebs-csi-node-hc5sz 3/3 Running 0 11m
kube-system ebs-csi-node-qpjtl 3/3 Running 0 11m
kube-system etcd-ip-10-20-101-216.eu-west-2.compute.internal 1/1 Running 0 17m
kube-system external-dns-76b8fb8586-4dj2m 2/2 Running 2 (7m19s ago) 12m
kube-system hubble-relay-79bfdd4c6c-brbgx 1/1 Running 0 14m
kube-system kiam-agent-f6tsl 1/1 Running 2 (7m33s ago) 7m36s
kube-system kiam-agent-tkm59 1/1 Running 2 (7m33s ago) 7m36s
kube-system kiam-agent-zbzkd 1/1 Running 2 (7m33s ago) 7m36s
kube-system kiam-namespace-annotation-kube-system-5l9b6 0/1 Completed 0 9m53s
kube-system kiam-server-fpvsp 1/1 Running 0 7m37s
kube-system kube-apiserver-ip-10-20-101-216.eu-west-2.compute.internal 1/1 Running 2 (17m ago) 17m
kube-system kube-controller-manager-ip-10-20-101-216.eu-west-2.compute.internal 1/1 Running 1 (17m ago) 17m
kube-system kube-scheduler-ip-10-20-101-216.eu-west-2.compute.internal 1/1 Running 1 (17m ago) 17m
kube-system kube-state-metrics-6b89676dbf-bqsrq 1/1 Running 0 10m
kube-system metrics-server-7cdb8d8cd8-mjfbs 1/1 Running 0 12m
kube-system metrics-server-7cdb8d8cd8-xcfm2 1/1 Running 0 12m
kube-system net-exporter-2vcfh 1/1 Running 0 12m
kube-system net-exporter-6h9nm 1/1 Running 0 12m
kube-system net-exporter-cj4wn 1/1 Running 0 12m
kube-system net-exporter-j2sq5 1/1 Running 0 12m
kube-system node-exporter-v1-3-1-4n7vd 1/1 Running 0 14m
kube-system node-exporter-v1-3-1-lh9p9 1/1 Running 0 14m
kube-system node-exporter-v1-3-1-tbg6s 1/1 Running 0 14m
kube-system node-exporter-v1-3-1-wkvgh 1/1 Running 0 14m
kube-system prometheus-operator-app-operator-f6b45b859-qfqvz 1/1 Running 0 11m
kube-system prometheus-prometheus-agent-0 2/2 Running 0 11m
kube-system vertical-pod-autoscaler-admission-controller-556558c8d9-ffc5d 1/1 Running 0 11m
kube-system vertical-pod-autoscaler-admission-controller-556558c8d9-z6spl 1/1 Running 0 11m
kube-system vertical-pod-autoscaler-recommender-67b5d54d5f-lgvb6 1/1 Running 0 11m
kube-system vertical-pod-autoscaler-updater-7bdddf5b-jgk8t 1/1 Running 0 11m
Blocked until https://github.com/giantswarm/roadmap/issues/1801 is fixed.
instruction on how to create a role is here https://github.com/giantswarm/giantswarm-aws-account-prerequisites/tree/master/capa-controller-role - use INSTALLATION_NAME goat
we should create the role in the AWS account 180547736195
and then just create another AWSClusterRoleIdentity
, using the default
CR in the goat
as a reference
.
Blocking again on https://github.com/giantswarm/roadmap/issues/1801 as the transit gateway needs to also be shared with the account automatically
can this one be moved out of the blocked column into the sprint backlog now that https://github.com/giantswarm/roadmap/issues/1801 is closed?
thunder / dev00
was recreated in a separate account.
Task
Test CAPA cluster creation in a different account.
TODOs