elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.13k stars 4.91k forks source link

[Stack Monitoring] [Kibana] Avoid querying /api/status on every cycle #31118

Open matschaffer opened 2 years ago

matschaffer commented 2 years ago

@afharo opened https://github.com/elastic/beats/pull/29911 to move the url generation for the kibana module into the init phase.

As-written the PR might cause problems in the event kibana is upgraded without upgrading (or at least restarting) metricbeat.

@afharo proposed this instead:

  1. During the first init, retrieve GET /api/status to build the final GET /api/stats URL
  2. Use the final URL for every subsequent call
  3. Check that kibana.version coming from GET /api/stats didn't change. If it changes, go back to 1.

Acceptance criteria:

matschaffer commented 2 years ago

@afharo if you can fill in any details about the problem your PR is intended to solve and how to test for the presence/absence of it, that'd be great.

afharo commented 2 years ago

Hi @matschaffer! Thank you for opening this issue!

The reason I created the PR is briefly explained in the section "Why is it important?":

Basically, Metricbeat is failing to report in service.address the actual URL that we hit because the query parameter exclude_usage=true is removed due to it being applied in a defer statement.

image

This bug drove us in a different direction when troubleshooting an issue in Kibana.

Also, we add this query parameter is based on the version of Kibana. Should we set it once and stop checking if it can be used on every interval?

The blocker we got in this PR is based on this comment: @sayden raised the concern that my solution might append duplicates of the query parameter on every interval:

  1. The first request will be &exclude_usage=true
  2. The second will be &exclude_usage=true&exclude_usage=true
  3. The third one &exclude_usage=true&exclude_usage=true&exclude_usage=true
  4. ...

I tried to test it locally, but I failed because there is a library that is not supported in M1 Macs. Any help with testing and finding an alternative solution is very much appreciated.

matschaffer commented 2 years ago

Should we set it once and stop checking if it can be used on every interval?

This wouldn't be an issue for cloud (since we relaunch a new metricbeat for each kibana instance), but wondering what this might imply during kibana upgrades...

afharo commented 2 years ago

@matschaffer that's a very good point!

OTOH, it feels like an overkill hitting both the GET /api/status and the GET /api/stats APIs every 10s. Especially when the GET /api/status API is currently only used to retrieve the version of Kibana to figure out if it can use the &exclude_usage=true parameter. I wonder if we could use the version returned in the GET /api/stats to detect any upgrade so we "reset" the init mechanism?

Something like:

  1. During the first init, retrieve GET /api/status to build the final GET /api/stats URL
  2. Use the final URL for every subsequent call
  3. Check that kibana.version coming from GET /api/stats didn't change. If it changes, go back to 1.

What do you think?

matschaffer commented 2 years ago

That sounds like a pretty solid plan to me 👍🏻

matschaffer commented 2 years ago

I updated the description to reflect the discussion so far. Please edit if I misquoted anything.

afharo commented 2 years ago

It looks great to me! I'll go ahead and close my PR in favour of this issue. Thank you!

botelastic[bot] commented 1 year ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

afharo commented 1 year ago

👍