elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.06k stars 4.89k forks source link

[Metricbeat/Kibana/Stats] `GET /api/status` crashes the agent when 503 status code is received #33838

Closed afharo closed 4 months ago

afharo commented 1 year ago

Metricbeat's Kibana module uses the GET /api/status in some places:

  1. The status metricset: https://github.com/elastic/beats/blob/a1a6bd8cd569182aeb0082e252ecbabad338a797/metricbeat/module/kibana/status/status.go#L41
  2. The stats metricset: https://github.com/elastic/beats/blob/a1a6bd8cd569182aeb0082e252ecbabad338a797/metricbeat/module/kibana/stats/stats.go#L90-L93

When any of the core services in Kibana is not available, the GET /api/status returns 503 while the body keeps the expected 200 body structure.

Running ECK locally, I noticed that the Kibana k8s service takes longer to start when the monitoring.metrics options set up because the metricbeat container bootloop-crashes (with backoff retries) while Kibana fully starts up and GET /api/status returns 200.

{"log.level":"info","@timestamp":"2022-11-28T12:35:12.661Z","log.origin":{"file.name":"instance/beat.go","file.line":432},"message":"metricbeat stopped.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-11-28T12:35:12.661Z","log.origin":{"file.name":"instance/beat.go","file.line":1062},"message":"Exiting: 4 errors: HTTP error 503 in : 503 Service Unavailable; HTTP error 503 in : 503 Service Unavailable; HTTP error 503 in : 503 Service Unavailable; HTTP error 503 in : 503 Service Unavailable","service.name":"metricbeat","ecs.version":"1.6.0"}

For details about how to set up a monitoring architecture in k8s, follow the steps explained in https://github.com/elastic/kibana/issues/145558#issuecomment-1323509356

IMO, metricbeat should not crash on 503 and, on top of that, still process the response body for the Kibana metricsets.

smith commented 8 months ago

We're not planning to do this at this time.

smith commented 7 months ago

Reopening and putting on Stack Monitoring board.

klacabane commented 7 months ago

Some kibana metricsets were calling http apis in constructor instead of the metricset lifecycle methods which could cause process termination when target is not available. This was fixed by https://github.com/elastic/beats/pull/35396 available in 8.7.0.

I've tried to reproduce by targeting a non-existing host which triggers the same code path, and the process stays up and running despite the http error. @afharo any chance you still have your setup handy to verify the behavior is fixed in >=8.7.0 ?

smith commented 4 months ago

It looks like this was fixed in the PR referenced in the comment above. Please reopen if this is still happening.