alertmanager CrashLoopBackOff

devopsdymyr commented 2 years ago

kubectl get po -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-demo-prometheus-operator-alertmanager-0 1/2 CrashLoopBackOff 7 11m demo-grafana-77f9bf8795-hxcrn 2/2 Running 0 45m demo-kube-state-metrics-55d6768864-2hjzf 1/1 Running 0 45m demo-prometheus-node-exporter-b4qdz 1/1 Running 0 45m demo-prometheus-node-exporter-crlf7 1/1 Running 0 45m demo-prometheus-node-exporter-cwqwr 1/1 Running 0 45m demo-prometheus-node-exporter-m7jqs 1/1 Running 0 45m demo-prometheus-node-exporter-qfxp4 1/1 Running 0 45m demo-prometheus-operator-operator-64f79fd9b-v5jzg 2/2 Running 0 45m prometheus-demo-prometheus-operator-prometheus-0 3/3 Running 1 45m prometheus-msteams-7b54bb96d9-h8x4g 0/1 CrashLoopBackOff 5 5m29s

finishedAt: "2022-03-29T04:40:57Z" message: | "1 error occurred:\n\t* Failed to resolve alertmanager-demo-prometheus-operator-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-demo-prometheus-operator-alertmanager-0.alertmanager-operated.monitoring.svc on 10.100.0.10:53: no such host\n\n" level=info ts=2022-03-29T04:40:57.192Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s level=debug ts=2022-03-29T04:40:57.219Z caller=main.go:355 externalURL=http://demo-prometheus-operator-alertmanager.monitoring:9093 level=info ts=2022-03-29T04:40:57.219Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-03-29T04:40:57.219Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 6: field webhook_config not found in type config.plain\n line 10: field webhook_config not found in type config.plain" level=debug ts=2022-03-29T04:40:57.219Z caller=cluster.go:539 component=cluster msg="leaving cluster" level=debug ts=2022-03-29T04:40:57.219Z caller=delegate.go:236 component=cluster received=NotifyLeave node=01FZ9ZM62Z3Y59DD6BD3R0Z877 addr=192.168.191.115:9094 level=info ts=2022-03-29T04:40:57.219Z caller=cluster.go:632 component=cluster msg="gossip not settled but continuing anyway" polls=0 elapsed=27.117949ms level=debug ts=2022-03-29T04:40:57.219Z caller=cluster.go:492 component=cluster msg="peer left" peer=01FZ9ZM62Z3Y59DD6BD3R0Z877 level=debug ts=2022-03-29T04:40:57.219Z caller=nflog.go:336 component=nflog msg="Running maintenance" level=debug ts=2022-03-29T04:40:57.219Z caller=silence.go:350 component=silences msg="Running maintenance" level=debug ts=2022-03-29T04:40:57.221Z caller=nflog.go:338 component=nflog msg="Maintenance done" duration=1.248646ms size=0 level=debug ts=2022-03-29T04:40:57.221Z caller=silence.go:352 component=silences msg="Maintenance done" duration=1.986643ms size=0 reason: Error startedAt: "2022-03-29T04:40:57Z"

zzhao2010 commented 2 years ago

You k8s infrastructure is healthy?

devopsdymyr commented 2 years ago

@zzhao2010 yes everything working fine

devopsdymyr commented 2 years ago

@zzhao2010 can you please update promethues and teams integration video

devopsdymyr / prom-teams

alertmanager CrashLoopBackOff #1