GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
189 stars 86 forks source link

Managed Service for Prometheus, CrashLoopBackOff on t2a-standard-2 node #1067

Closed jo-gre closed 1 month ago

jo-gre commented 1 month ago

We have enabled Google Managed Prometheus in our new cluster with control plane version 1.29.5-gke.1091002 on regular channel.

This cluster currently runs with a single t2a-standard-2 node and a few pods on there. After checking the box in the GCP console UI to test gmp 4 new pods got spawned in gmp-system:

❯ kubectl get pods -n gmp-system
NAME                              READY   STATUS             RESTARTS        AGE
alertmanager-0                    0/2     Init:0/1           0               92m
collector-zb5sq                   0/2     Init:0/1           0               92m
gmp-operator-547565645b-vgtcp     0/1     CrashLoopBackOff   9 (2m34s ago)   23m
rule-evaluator-69dcb4446c-cbt7k   0/2     Init:0/1           0               92m

While the 3 are stuck in init the gmp-operator crashes with

❯ kubectl logs -n gmp-system -p gmp-operator-547565645b-vgtcp
exec /bin/operator: exec format error
TheSpiritXIII commented 1 month ago

We accidentally broke ARM builds. The fix is available in https://github.com/GoogleCloudPlatform/prometheus-engine/pull/992 and is available today in the GKE rapid release channel on the latest minor version. At some point we'll backport it to old GKE minor versions and other release channels but we don't have a timeline yet.

Sorry for the inconvenience!

bernot-dev commented 1 month ago

Closing this for now. Feel free to follow up if you have additional questions.