elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.7k stars 8.12k forks source link

[Infra] Create new formulas for RX and TX metrics #188641

Closed crespocarlos closed 1 month ago

crespocarlos commented 1 month ago

Summary

Related issue

The current RX and TX metrics used in the Hosts View, Inventory UI and Inventory alert rule is incorrect. (more details https://github.com/elastic/kibana/issues/184099)

RX: average(host.network.ingress.bytes) * 8 / (max(metricset.period, kql='host.network.ingress.bytes: *') / 1000) TX: average(host.network.egress.bytes) * 8 / (max(metricset.period, kql='host.network.egress.bytes: *') / 1000)

The calculated rate is wrong because it uses metricset.period. This setting can be configured with any interval in metricbeat. Therefore, the metric is not necessarily a rate per second.

We need to create a new formula to compute the these metrics.

Lens

Aggregation

We need to keep backward compatibility with the existing metrics in the Inventory UI and Inventory alert rule

Label for old metrics: Inbound Traffic (Legacy) / Outbound Traffice(Legacy) Label for new metrics: Inbound Traffic / Outbound Traffic

Implementation hints

Acceptance Criteria

elasticmachine commented 1 month ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

crespocarlos commented 1 month ago

@roshan-elastic we need to define the labels for RX and TX metrics in the inventory and alerts flyout.

Should we update the tooltips too?

roshan-elastic commented 1 month ago

Hey @crespocarlos - I've updated the source docs for this.

Would you mind updating the formulae in the doc for the new ones (I've commented you directly)?

smith commented 1 month ago

Might there be a better name than "Legacy" for this. "Non-normalized/Normalized" is wrong but something?

crespocarlos commented 1 month ago

Might there be a better name than "Legacy" for this. "Non-normalized/Normalized" is wrong but something?

Perhaps "Inbound/Outbount Traffic (per metricset.period)" and "Inbound/Outbount Traffic (per second)"?

roshan-elastic commented 1 month ago

Yeah fair point on this. I like the suggestions - is there any language we could use which leaves out Elastic terminology?

jennypavlova commented 1 month ago

Hey @crespocarlos,

I checked the issue description - so we want to replace the lens formulas with only the average with sum part and remove the next part, so we have :

export const rx: LensBaseLayer = {
  label: i18n.translate('xpack.metricsData.assetDetails.formulas.rx', {
    defaultMessage: 'Network Inbound (RX)',
  }),
  value: 'sum(host.network.ingress.bytes) * 8',
  format: 'bits',
  decimals: 1,
};

export const tx: LensBaseLayer = {
  label: i18n.translate('xpack.metricsData.assetDetails.formulas.tx', {
    defaultMessage: 'Network Outbound (TX)',
  }),
  value: 'sum(host.network.egress.bytes) * 8',
  format: 'bits',
  decimals: 1,
};

and for the aggregation in the snapshot folder, we keep the old one (rename to {rx/tx}_legacy or something) and we add the new one from the description and use it in the host (and use both in inventory and alerts), right?

crespocarlos commented 1 month ago

Hi @jennypavlova

and for the aggregation in the snapshot folder, we keep the old one (rename to {rx/tx}_legacy or something) and we add the new one from the description and use it in the host (and use both in inventory and alerts), right?

We keep the old ones name untouched, otherwise it will affect existing alerts. The hosts view will only use the new one. Inventory and Alerts will use both.

For the old ones, the only thing that changes is their label

jennypavlova commented 1 month ago

Hey @crespocarlos, Thanks for clarifying! I am trying the new aggregation now and I am getting 0s when I use it so I am trying to debug that Some changes I made:

link to the wip rx file for reference.

However, this query returns results so I am wondering if we can use it instead or if we need the filter in this case. It looks more straightforward and I am trying to understand why we are changing it and what is wrong with the one from the description. I will investigate more and try to debug it (please let me know if you have an idea why it's not working or if I am missing something there)