crossplane-contrib / provider-jet-gcp

GCP Provider for Crossplane generated using Terrajet
Apache License 2.0
12 stars 21 forks source link

GKE cluster becomes unresponsive when provider installed #39

Open unixdaddy opened 2 years ago

unixdaddy commented 2 years ago

What happened?

From slack discussion

Deployed vm instance using crossplane provider-jet-gcp:v0.2.0-preview on GCP using GKE as my crossplane management cluster. Even with GKE 1.23.1, 400+ CRDS meant the API server response was pretty slow

After creating a GKE 1.23.1 cluster and installing the jet provider the system is totally unresponsive for awhile (time for a :tea: )

kubectl crossplane install provider crossplane/provider-jet-gcp:v0.2.0-preview

after awhile the system becomes responsive, however commands to crossplane resources can be slow

time kubectl get crossplane
I0126 17:13:21.538682    4976 request.go:665] Waited for 1.193141986s due to client-side throttling, not priority and fairness, request: GET:https://XX.XX.XX.XX/apis/notebooks.gcp.jet.crossplane.io/v1alpha1?timeout=32s
real    1m25.191s
user    0m21.987s
sys     0m2.499s

@muvaf has explained why

if you use full name of the kinds, you’ll get immediate results like kubectl get dbinstance.rds.aws.jet.crossplane.io For categories, like kubectl get managed or kubectl get crossplane there isn’t much one can do because all category queries have to go through every API, which increases with that many CRDs. In short, I don’t expect category queries to get faster soon but we’ll see how discovery client can be improved in upstream and you can use full names until then :slightly_smiling_face:

The question is how can the initial unresponsiveness - the install of the 438 CRDs - be mitigated or resolved or accepted.

How can we reproduce it?

Chart Name: crossplane Chart Description: Crossplane is an open source Kubernetes add-on that enables platform teams to assemble infrastructure from multiple vendors, and expose higher level self-service APIs for application teams to consume. Chart Version: 1.6.2 Chart Application Version: 1.6.2

Kube Version: v1.23.1-gke.500

**system is responsive** 
- install kubectl crossplane plugin
- install GCP jet provider version 0.2.0 preview
 **system is unresponsive** and time for 🍵  also pods restarts a few times until things settle down

kubectl crossplane install provider crossplane/provider-jet-gcp:v0.2.0-preview

kubectl get all -n kube-system Unable to connect to the server: net/http: TLS handshake timeout Unable to connect to the server: net/http: TLS handshake timeout Unable to connect to the server: net/http: TLS handshake timeout Unable to connect to the server: net/http: TLS handshake timeout Unable to connect to the server: net/http: TLS handshake timeout Unable to connect to the server: net/http: TLS handshake timeout

time kubectl get pods -n crossplane-system NAME READY STATUS RESTARTS AGE crossplane-cfb4cb44f-8qqlf 1/1 Running 3 (6m8s ago) 19m crossplane-provider-jet-gcp-b5eb4524318a-7dd4b6fb44-pw8tk 1/1 Running 3 (5m54s ago) 9m31s crossplane-rbac-manager-d9d6c54c7-75lbn 1/1 Running 1 (8m33s ago) 19m

real 1m0.098s user 0m0.059s sys 0m0.027s

time kubectl get pods -n crossplane-system The connection to the server XX.XX.XX.XX was refused - did you specify the right host or port?

real 0m37.761s user 0m0.052s sys 0m0.019s


After awhile the system becomes responsive but calls to crossplane resources can be slow if not using full name of the kinds as explained above.

### What environment did it happen in?
Crossplane version:  1.6.2
Provider version: provider-jet-gcp:v0.2.0-preview
Cloud provider: GCP
Kubernetes version: GKE 1.23.1-gke.500
<!--
Include at least the version or commit you were running. Consider
also including your:

* Cloud provider or hardware configuration
* Kubernetes version (use `kubectl version`)
* Kubernetes distribution (e.g. Tectonic, GKE, OpenShift)
* OS (e.g. from /etc/os-release)
* Kernel (e.g. `uname -a`)
-->
muvaf commented 2 years ago

FWIW, regional clusters seem to work fine.

hprotzek commented 2 years ago

I've just installed the 0.2.0-preview on a regional gke 1.22.6 cluster and I had to remove it again, as our helm/helmfile deployments run all into timeouts.

Maybe it would be good to have an include list of crd's that you want to use, instead of installing all 400 by default.

muvaf commented 2 years ago

Hi @hprotzek https://github.com/crossplane/crossplane/pull/2918 is where we document these issues, feel free to take a look at. https://github.com/crossplane/crossplane/issues/2869 is the issue about selective installation that you can give a thumbs up and/or provide more details.

For the cluster becoming unresponsive, I'd suggest opening a ticket to GKE support team.