elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.24k forks source link

[CCS] Add include_remotes=true to cluster-stats calls #192129

Closed thomasneirynck closed 1 month ago

thomasneirynck commented 2 months ago

In order to collect telemetry, Kibana should include include_remotes=true to the cluster/stats calls.

To accommodate possible larger response times, Kibana should also increase the timeout to 60s.

So e.g. GET /_cluster/stats?timeout=60s&include_remotes=true

cc @quux00

elasticmachine commented 2 months ago

Pinging @elastic/kibana-presentation (Team:Presentation)

thomasneirynck commented 2 months ago

I will assign for myself for now. This is tangentially related to earlier CCS work (hence, the presentation-team label), but is mainly about collecting telemetry (https://github.com/elastic/kibana/blob/b6287708f687d4e3288851052c0c6ae4ade8ce60/src/plugins/telemetry/server/telemetry_collection/get_cluster_stats.ts#L19)

There may be a few downstream things to verify that CCS-UX keeps working.

rudolf commented 2 months ago

I wonder if we should use timeout=60s or if we should rather set the timeout in the client options like:

const body = await esClient.info({ filter_path: 'cluster_uuid' }, { requestTimeout: 60000 });

The query paramater will tell Elasticsearch to stop trying if it takes longer than 60s. But Kibana's default 30s ES request timeout will still apply, so after 30s Kibana (via the elasticsearch-js client) will close the socket. Usually if you don't specify a query param timeout to ES it will just keep on trying as long as you're willing to wait, but I'm not particularly familiar with this API.

thomasneirynck commented 2 months ago

@rudolf not sure if I 100% understand. What is the requestTimeout param, and how is it different from timeout?

naj-h commented 1 month ago

hey @thomasneirynck @rudolf! Are we on track to merge this in 8.16?

rudolf commented 1 month ago

timeout How long Elasticsearch will try before giving up and returning a timeout error

requestTimeout How long Kibana will wait before giving up and closing the socket. If this happens, plugin code receive a timeout error and Elasticsearch stops processing the request.

They're essentially the same it's just a question about who is keeping track of the time and initiative the timeout.

The important thing is if timeout=60000ms and requestTimeout: 30000 then Kibana will still only wait 30s before closing the connection and throwing a timeout error. So timeout can be misleading if not used carefully. The default requestTimeout can also be changed by users through kibana.yml so for something like telemetry we probably want to override the default and say "I know better than the user how long we should wait for this reply".

Additionally, there are some discussions to potentially deprecate timeout and to avoid confusion when these two values are different I think we should rather stick simply setting requestTimeout.

thomasneirynck commented 1 month ago

I put up https://github.com/elastic/kibana/pull/195793 as a draft to close this.

@rudolf , let's discuss if this is the right way to go.

@naj-h unless there is some major fallout, I expect this to merge before the 8.16 release.