GoogleCloudPlatform / pubsec-declarative-toolkit

The GCP PubSec Declarative Toolkit is a collection of declarative solutions to help you on your Journey to Google Cloud. Solutions are designed using Config Connector and deployed using Config Controller.
Apache License 2.0
31 stars 28 forks source link

new GKE autopilot KCC cluster has new admission webhook warnings as of 20230928 - update: new cluster ok #539

Open fmichaelobrien opened 1 year ago

fmichaelobrien commented 1 year ago

Triage

https://cloud.google.com/kubernetes-engine/docs/troubleshooting/troubleshoot-upgrades

details

See no issues with the cluster in #445 New issues since 20230927 in #534

KCC cluster up via https://github.com/ssc-spc-ccoe-cei/gcp-tools/blob/main/scripts/bootstrap/setup-kcc.sh

Screenshot 2023-09-28 at 1 04 51 PM

TODO: check webhook visibility fix This is new since my last cluster was up 6 weeks ago in #445

This cluster has an admission webhook installed that is intercepting system critical requests in the last 24 hours. Intercepting these requests can impact availability of the GKE Control Plane. Learn more

https://cloud.google.com/kubernetes-engine/docs/how-to/optimize-webhooks?&_ga=2.215054544.-699491976.1695837480#unsafe-webhooks

gatekeeper-validating-webhook-configuration Intercepting cluster-scoped system resources  
gatekeeper-validating-webhook-configuration Intercepting resources in the kube-node-lease namespace  
gatekeeper-validating-webhook-configuration Intercepting resources in the kube-system namespace

gatekeeper-validating-webhook-configuration Intercepting cluster-scoped system resources
gatekeeper-validating-webhook-configuration Intercepting resources in the kube-node-lease namespace gatekeeper-validating-webhook-configuration Intercepting resources in the kube-system namespace

Screenshot 2023-09-28 at 1 05 22 PM Screenshot 2023-09-28 at 1 06 07 PM

https://cloud.google.com/kubernetes-engine/docs/how-to/optimize-webhooks?&_ga=2.254246863.-699491976.1695837480#no-available-endpoints

Screenshot 2023-09-28 at 1 07 04 PM

workloads up

Screenshot 2023-09-28 at 1 07 23 PM
fmichaelobrien commented 1 year ago

it occurred for me even on a completely clean GKE cluster today after setup-kcc.sh before core-landing-zone package rendering and on my 6 week old cluster with all the LZ packages up full LZ older cluster "Update webhook to no longer intercept system requests."

fmichaelobrien commented 1 year ago

trying standard GKE - also to reduce spend on https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/492

kubectl is timing out - kpt won't deploy as usual as expectedmichael@cloudshell:~ (kcc-oi-cluster)$ kubectl get pods --all-namespaces E0928 19:21:26.018017    1006 memcache.go:265] couldn't get current server API group list: Get "https://172.16.0.130/api?timeout=32s": dial tcp 172.16.0.130:443: i/o timeout

michael@cloudshell:~/kcc-oi/kpt (kcc-oi-cluster)$ kpt live apply core-landing-zone --reconcile-timeout=2m --output=table W0928 19:25:14.080087    1087 factory.go:66] Failed to query apiserver to check for flow control enablement: %vmaking /livez/ping request: context deadline exceeded

As a workaround I will use a standard cluster and retest - avoided standard as there was the odd timeout between 15-30 min

however my older kcc cluster on kcc.landing.systems even with webhook error is ok

root_@cloudshell:~ (kcc-kls-cluster3)$ kubectl get pods --all-namespaces
NAMESPACE                         NAME                                                       READY   STATUS    RESTARTS        AGE
cnrm-system                       cnrm-controller-manager-3fo6phebqgg23knqq5qq-0             1/1     Running   0               4d2h
cnrm-system                       cnrm-controller-manager-7c4rehlik7xgxc2utq6a-0             1/1     Running   0               4d1h

something off with my clean KCC env obrien.industries (will check the diffs on setup-kcc.sh) - as my older 6 week kcc.landing.systems - even with the admission errors will edit the yaml no problem

something off with my clean KCC env obrien.industries (will check the diffs on setup-kcc.sh) - as my older 6 week kcc.landing.systems - even with the admission errors will edit the yaml no problem

root_@cloudshell:~ (kcc-kls-cluster3)$ kubectl edit validatingwebhookconfiguration/gatekeeper-validating-webhook-configuration
Edit cancelled, no changes made.
root_@cloudshell:~ (kcc-kls-cluster3)$
Screenshot 2023-09-28 at 3 36 00 PM

Triaging connection older server working has 35 address in .kube/config

    server: https://35.203.120.71
  name: gke_kcc-kls-cluster3_northamerica-northeast1_krmapihost-kcc-kls3

newer server has private address

    server: https://172.16.0.130
  name: gke_kcc-oi-cluster_northamerica-northeast1_krmapihost-kcc-oi

Found above issue - forgot to add -p for public endpoint

ran./setup-kcc.sh -af kcc.env

for

https://github.com/ssc-spc-ccoe-cei/gcp-tools/commit/941d542e5024144b541136e19700b50cd8eaf895

fmichaelobrien commented 1 year ago

rerun setup-kcc.sh with -p public ip option

see

export CLUSTER=kcc-oi2
export REGION=northamerica-northeast1
export PROJECT_ID=kcc-oi2-cluster
export LZ_FOLDER_NAME=kcc-lz-20230928b
export NETWORK=kcc-oi2-vpc
export SUBNET=kcc-oi2-sn

michael@cloudshell:~/kcc-oi/github/gcp-tools/scripts/bootstrap (kcc-oi)$ ./setup-kcc.sh -afp kcc.env

1644 - estimate 1700 kcc-oi2 cluster up
##INFO - Create Config controller

Create request issued for: [kcc-oi2]
Waiting for operation [projects/kcc-oi2-cluster/locations/northamerica-northeast1/operations/operation-1695933801715-606715bd057e8-f452780e-92d1cb2e] to complete...working..

fix

michael@cloudshell:~/kcc-oi/github/gcp-tools/scripts/bootstrap (kcc-oi)$ ./setup-kcc.sh -afp kcc.env
aiting for operation [projects/kcc-oi2-cluster/locations/northamerica-northeast1/operations/operation-1695933801715-606715bd057e8-f452780e-92d1cb2e] to complete...done.                                    
Created instance [kcc-oi2].
Fetching cluster endpoint and auth data.
kubeconfig entry generated for krmapihost-kcc-oi2.

##INFO - Config controller get credentials

Fetching cluster endpoint and auth data.
kubeconfig entry generated for krmapihost-kcc-oi2.

##WARNING - configure-kcc-access.sh script should be run once connectivity to the cluster is established using bastion host / proxy.
ichael@cloudshell:~/kcc-oi/github/gcp-tools/scripts/bootstrap (kcc-oi2-cluster)$ kubectl get nodes
NAME                                                STATUS   ROLES    AGE     VERSION
gk3-krmapihost-kcc-oi2-default-pool-6fc83c0e-ss20   Ready    <none>   9m12s   v1.27.3-gke.100
gk3-krmapihost-kcc-oi2-pool-1-28f0e374-tzw8         Ready    <none>   3m43s   v1.27.3-gke.100
gk3-krmapihost-kcc-oi2-pool-1-ae2f0850-4kmt         Ready    <none>   7m32s   v1.27.3-gke.100
gk3-krmapihost-kcc-oi2-pool-1-c9c2a582-9sdc         Ready    <none>   2m47s   v1.27.3-gke.100

cluster up with no admissions endpoint (has both public and private endpoints)

Screenshot 2023-09-28 at 5 03 44 PM Screenshot 2023-09-28 at 5 09 12 PM