Closed primeroz closed 1 year ago
The following need to be added to mc-bootstrap
since they cannot be moved to the default since that would affect vintage clusters
Coming from this comment https://github.com/giantswarm/roadmap/issues/2011#issuecomment-1430034809
I would like that we "reserve" 10.223.0.0/16
in our VPN for CAPZ MCs (+WCs), which is /16 range that comes before CAPA goat. That should give us 16 MCs (+WCs) with 10.223.x.0/20 shared range, which I hope we won't have to use and that we will sooner move to a more scalable solution with private endpoints (or similar).
That being said, for glippy and its MCs, we would use:
10.223.0.0/20
as a shared range for glippy VPN connection (this should accommodate glippy MC and all glippy WCs)10.223.0.0/24
for glippy VNet, which would then break down into
10.223.0.0/25
for workers subnet (50% of the VNet)10.223.0.128/26
for control plane subnet (can be even smaller)10.223.0.192/26
for other subnets that we might need (e.g. a future subnet that will host private endpoints)Then, next test CAPZ MC (if/when we need one) would take 10.223.16.0/20
for VPN shared range, broken down in the similar way like glippy above.
Ok, once the PR is merged we will create a new release for cluster-azure and , on glippy recreation, set
We currently don't support creating extra subnets in the network spec https://github.com/giantswarm/cluster-azure/blob/main/helm/cluster-azure/templates/_azure_cluster.tpl#L16
so for now we'll stick to this and get the bastion on the control plane subnet
Slightly updated comment above, added missing calculation (I hit ctrl-enter accidentally :grimacing:)
wait for
on the bastion host fails
kubectl --kubeconfig=./kubeconfigs/kind-mc-initial-glippy.kubeconfig wait --timeout=20m '--for=jsonpath={.status.phase}=Running' -n org-giantswarm [machine.cluster.x-k8s.io/glippy-bastion-86cb948f85-h9lgk](http://machine.cluster.x-k8s.io/glippy-bastion-86cb948f85-h9lgk)
➜ kubectl --kubeconfig=./kubeconfigs/kind-mc-initial-glippy.kubeconfig get -n org-giantswarm machine.cluster.x-k8s.io/glippy-bastion-86cb948f85-h9lgk -o yaml | yq .status.phase Provisioned
While for MD machines it is indeed Running
* Similar to ^ the `machineDeployment` for bastoin reports `ScalingUp`
➜ kubectl --kubeconfig=./kubeconfigs/kind-mc-initial-glippy.kubeconfig get -n org-giantswarm machinedeployment.cluster.x-k8s.io/glippy-bastion -o yaml | yq .status.phase ScalingUp
* this ^ also blocks the Pivot because the bastion is `Provisioned` and not `Running` :(
"Machine" CRs are not ready waiting for 20s
move
fails
Error: failed to get object graph: failed to check for provisioned infrastructure: cannot start the move operation while "/, Kind=" org-giantswarm/glippy-bastion-86cb948f85-h9lgk is still provisioning the node
azure-ad-pod-identity-app - already-exists
cluster-azure APP
to be applied through the AppOperator since there are extra values coming from the cluster-values
CM that are not in the schema
status:
appVersion: ""
release:
lastDeployed: null
reason: |
values don't meet the specifications of the schema(s) in the following chart(s):
cluster-azure:
- (root): Additional property managementCluster is not allowed
- (root): Additional property baseDomain is not allowed
- (root): Additional property provider is not allowed
status: not-installed
version: ""
➜ k get cm -n giantswarm glippy-chart-values -o yaml | egrep -i "baseDomain|managementCluster|provider" baseDomain: test.gigantic.io managementCluster: glippy provider: capz providerSpecific:
* trivy is pending `giantswarm trivy-0 0/1 Pending 0 57m`
* prometheus for glippy is crashlooping `glippy-prometheus prometheus-glippy-0 1/2 CrashLoopBackOff 6 (4m6s ago) 16m` - `OOMing`
* PMO not creating heartbeat
* PMO is failing to create VPA ( for prometheus ? ) causing prometheus to OOM
* vpa patch pod failing
$ gopass env giantswarm/opsctl/ opsctl ssh glippy glippy-md00-0f94be16-cl6bs -b glippy_auto_branch --level=debug
$ ssh -J giantswarm@$(kubectl get machine glippy-bastion-694f789f4d-vpslr -o json | jq -r '.status.addresses[] | select( .type == "ExternalIP").address') giantswarm@10.233.0.7 hostname
glippy-md00-0f94be16-cl6bs
@nprokopic When we create the first Private cluster i am really curious to
➜ clusterctl describe cluster fctest1 --echo --grouping=false
NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/fctest1 True 6m26s
├─ClusterInfrastructure - AzureCluster/fctest1 True 8m4s
├─ControlPlane - KubeadmControlPlane/fctest1 True 6m26s
│ └─Machine/fctest1-c8x5h True 6m26s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-nvtmf True 8m3s
│ └─MachineInfrastructure - AzureMachine/fctest1-control-plane-c17c01d5-strjq True 6m27s
└─Workers
├─MachineDeployment/fctest1-bastion True 9m40s
│ └─Machine/fctest1-bastion-64f988dff6-hwq7l True 14s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-bastion-973fd873-d7vvm True 6m34s
│ └─MachineInfrastructure - AzureMachine/fctest1-bastion-836b66f0-nzp7r True 14s
└─MachineDeployment/fctest1-md00 True 5m5s
├─Machine/fctest1-md00-b886b4db4-4lf4z True 5m11s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-md00-5f92899e-dp9qj True 6m34s
│ └─MachineInfrastructure - AzureMachine/fctest1-md00-22b81e6f-5xfmr True 5m11s
├─Machine/fctest1-md00-b886b4db4-77cx6 True 5m10s
│ ├─BootstrapConfig - KubeadmConfig/fctest1-md00-5f92899e-nlwnz True 6m34s
│ └─MachineInfrastructure - AzureMachine/fctest1-md00-22b81e6f-bzr8z True 5m10s
└─Machine/fctest1-md00-b886b4db4-9rrf7 True 5m11s
├─BootstrapConfig - KubeadmConfig/fctest1-md00-5f92899e-7hr78 True 6m34s
└─MachineInfrastructure - AzureMachine/fctest1-md00-22b81e6f-5bjt9 True 5m11s
NAME INSTALLED VERSION CREATED AT LAST DEPLOYED STATUS
fctest1 0.0.11 7m11s 7m11s deployed
fctest1-app-operator 6.4.2 7m7s 6m48s deployed
fctest1-azure-cloud-controller-manager 1.24.6-gs1 3m31s 2m42s deployed
fctest1-azure-cloud-node-manager 1.24.6-gs1 3m31s 3m22s deployed
fctest1-azuredisk-csi-driver 1.26.2-gs1 3m31s 3m24s deployed
fctest1-cert-exporter 2.3.1 3m31s 2m44s deployed
fctest1-chart-operator 2.33.0 7m6s 3m29s deployed
fctest1-cilium 0.6.1 3m31s 3m20s deployed
fctest1-coredns 1.13.0 3m31s 2m43s deployed
fctest1-default-apps 0.0.11 7m11s 3m31s deployed
fctest1-kube-state-metrics 1.14.2 3m31s 43s deployed
fctest1-metrics-server 2.0.0 3m31s 3m21s deployed
fctest1-net-exporter 1.13.0 3m31s 2m42s deployed
fctest1-node-exporter 1.15.0 3m31s 43s deployed
fctest1-observability-bundle 0.1.9 3m31s 3m29s deployed
fctest1-prometheus-agent 3m29s secret-merge-failed
fctest1-prometheus-operator-app 3.0.0 3m29s 40s deployed
fctest1-prometheus-operator-crd 3.0.0 3m29s 42s deployed
fctest1-vertical-pod-autoscaler 2.5.3 3m31s 2m39s deployed
fctest1-vertical-pod-autoscaler-crd 1.0.1 3m31s 2m43s deployed
➜ kubectl --kubeconfig /dev/shm/fctest1 get node
NAME STATUS ROLES AGE VERSION
fctest1-control-plane-c17c01d5-strjq Ready control-plane 2m v1.24.10
fctest1-md00-22b81e6f-5bjt9 Ready <none> 54s v1.24.10
fctest1-md00-22b81e6f-5xfmr Ready <none> 55s v1.24.10
fctest1-md00-22b81e6f-bzr8z Ready <none> 58s v1.24.10
➜ kubectl --kubeconfig /dev/shm/fctest1 get pod -A | grep -v Running | grep -v Completed
NAMESPACE NAME READY STATUS RESTARTS AGE
node allocatable
is not an integer ( like in our case is 3880
) which is causing the prometheus OOMING ( because vpa for prometheus fails to install) and possibly also the lack of heartbeatwrong
compared to what the job expects https://gigantic.slack.com/archives/C02GDJJ68Q1/p1676555870369069
Had to fix VPA CRD APP by hand
helm-release/name
annotations to match the new chartWe need to change the way we install vpa-crd in mc-bootstrap to do a helm install
rather than the APP install to avoid this problems
All Issues from glippy recreate are now addressed
I will close this one once the PMO is released and i have updated CAPZ-APP-COLLECTION
Re-Re created glippy with right 10.223
network cidrs
this is all done an dPMO was released
Motivation
Related to https://github.com/giantswarm/roadmap/issues/2011
Recreate Glippy MC so we can proceed with the private WC implementation
TODO
glippy
auto branches for any change that might need to go in Main before recreating glippyall Workload clusters
on glippyglippy
resource group from Azure consoleglippy auto branches
,lastpass secrets
default secrets
in lastpass to match changes that were done to thevalues.yaml
of cluster Azuremake create-config
stepOutcome
For Clippy