NetApp / harvest

Open-metrics endpoint for ONTAP and StorageGRID
https://netapp.github.io/harvest/latest
Apache License 2.0
141 stars 36 forks source link

Volume / Volume Deep Dive Dashboard Latency Panel Description #3012

Closed tsohst closed 3 weeks ago

tsohst commented 3 weeks ago

A note for the community

Problem

All latency panels in the Volume / Volume Deep Dive Grafana Dashboard are having the same description text. Is there any ressource to get more info whats the meaning of these charts? image

Configuration

-

Poller

-

Version

-

Poller logs

-

OS and platform

-

ONTAP or StorageGRID version

-

Additional Context

-

References

-

rahulguptajss commented 3 weeks ago

@tsohst Similar panels at the workload level are available in the Workload dashboard under Row Latency Breakdown and have descriptions. Could you check if those look good? If they do, we will update similar descriptions in these panels as well.

image
tsohst commented 3 weeks ago

Hello @rahulguptajss these descriptions look better to me and provide more context to it.

I'm just confused why are they in % not in ms? What is the % referred to?

rahulguptajss commented 3 weeks ago

Hello @rahulguptajss these descriptions look better to me and provide more context to it.

I'm just confused why are they in % not in ms? What is the % referred to?

The workload dashboard shows these as percentages. It indicates how much % of the total workload latency is consumed by a relevant subsystem. Do you think it would be better to have the percentage in the volume dashboard as well?

tsohst commented 3 weeks ago

Do you think it would be better to have the percentage in the volume dashboard as well?

Hm.. I'm looking for a quick and easy dashboard to have latency / performance issues in one place. Like you know when someone comes around asking if there are any performance issues currenly I can just refer to that. Like an flowchart or drilldown Cluster/Network/Disk [...] Volume/LUN maybe to see where the issues is or if there is a bottleneck somewhere in the cluster.

Yesterday I was also asked if we can see latency based on client connections? This was for a SAN environment with multiple VMware ESXI Hosts.

rahulguptajss commented 3 weeks ago

Do you think it would be better to have the percentage in the volume dashboard as well?

Hm.. I'm looking for a quick and easy dashboard to have latency / performance issues in one place. Like you know when someone comes around asking if there are any performance issues currenly I can just refer to that. Like an flowchart or drilldown Cluster/Network/Disk [...] Volume/LUN maybe to see where the issues is or if there is a bottleneck somewhere in the cluster.

Yesterday I was also asked if we can see latency based on client connections? This was for a SAN environment with multiple VMware ESXI Hosts.

We do have performance panels for each object in their respective dashboards. We have also tried to create some high-level overviews through the cDoT and Datacenter dashboards in Harvest. We could potentially add more object performance panels there or create a new dashboard that consolidates all object-relevant performance panels in one place. Would that help?

We have top client-related issues opened here: Issue #2591 and Issue #2118. Could you please +1 these issues if they match your requirements, or open a new one if they do not?

tsohst commented 3 weeks ago

Thank you for your help!

rahulguptajss commented 2 weeks ago

@tsohst Descriptions have been added to the dashboards. You can either install the nightly build or simply import the dashboard from here.

Descriptions have also been added to the metric documentation, which you can find here.