CentaurusInfra / arktos

Arktos for large-scale cloud platform
Apache License 2.0
245 stars 69 forks source link

[kube-up][scale-up][mizar]: event-exporter CrashLoopBackOff with error: Failed to initialize sink #1380

Open sonyafenge opened 2 years ago

sonyafenge commented 2 years ago

What happened: start scale-up with mizar using kube-up.sh, event-exporter crashloopbackoff with error:

{"log":"F0224 00:29:21.335208       1 main.go:75] Failed to initialize sink: failed to build sink config: not running on GCE, which is not supported for Stackdriver sink\n","stream":"stderr","time":"2022-02-24T00:29:21.33556028Z"}
$ kubectl get pods -AT | grep event-exporter
system   kube-system   event-exporter-v0.2.5-868dff6494-h4hw4               4772125109102601041   0/1     CrashLoopBackOff   26         118m

What you expected to happen: event-exporter start successfully How to reproduce it (as minimally and precisely as possible):

$ export KUBEMARK_NUM_NODES=100 NUM_NODES=2 SCALEOUT_CLUSTER=false SCALEOUT_TP_COUNT=1 SCALEOUT_RP_COUNT=1 RUN_PREFIX=sonyaperf1-022322
$ export MASTER_DISK_SIZE=500GB MASTER_ROOT_DISK_SIZE=500GB KUBE_GCE_ZONE=us-west2-b MASTER_SIZE=n1-highmem-32 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=500GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false ETCD_QUOTA_BACKEND_BYTES=8589934592 SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=200 LOGROTATE_MAX_SIZE=200M KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2 GCE_REGION=us-west2-b NETWORK_PROVIDER=mizar
$ ./cluster/kube-up.sh

Anything else we need to know?:

Environment:

Sindica commented 2 years ago

May not be 130 release blocker

sonyafenge commented 2 years ago

Debug this issue and found:

  1. ubuntu is missing cloud logging agent, cloud monitor agent. these need be installed to support event-exporter;
  2. Event-exporter requests metadata.onGCE to be true. debug to metadata.onGCE and found this is trying to LookupHost(ctx, "metadata.google.internal"), support for internet access in pods is still in progress based on issue https://github.com/CentaurusInfra/arktos/issues/1373