Closed julienlim closed 6 years ago
@sankarshanmukhopadhyay @brainfunked @r0h4n @nthomas-redhat @Tendrl/qe @Tendrl/tendrl_frontend @japplewhite @rghatvis@redhat.com @mcarrano
This dashboard proposal is ready for review. Note: API impact, module impact, etc. has to be filled out by someone else -- maybe @cloudbehl, @anmolbabu, or @anivargi.
Suggested Labels (for folks who have permissions to label the spec):
Row 1 Panel 4: Disks No platform support for disk status as such. This won't be supported now
Panel 6: Growth Rate Panel 7: Time Remaining (Weeks) Does this really make any sense to display at the brick level? MVP just talks about the projections at volume level only.
Row 2 Panel 9: IO Size Not MVP
@nthomas-redhat I've updated and marked Panel 4 (disks) and Panel 9 (IO size) as FUTURE.
For Panel 6 & 7 (Growth Rate and Time Remaining), it's a valid question whether or not it makes sense to project this for a brick, and I would say yes. Per chatting with Alok, there are several users who have single bricks on a single node within the Gluster cluster. While you can see this by host and volume, seeing it by brick can also be valuable (though admittedly redundant if you're doing it by host, except with the host, it includes the boot disk and other disks not used for Gluster volumes).
In addition, if we calculate growth and time remaining easily for the others, replicating here is trivial. So I think it should remain in the dashboard.
@sankarshanmukhopadhyay @brainfunked @r0h4n @nthomas-redhat @Tendrl/qe @Tendrl/tendrl_frontend @japplewhite @rghatvis@redhat.com @mcarrano
Here's a rough mockup of the proposed brick dashboard:
Noting some additional panels added -- see Bricks dashboard: Unclear what "Utilization" panel is showing.
Closing this one, please open new issue with relevant context if anything is missing
Dashboard Spec - Brick Dashboard
Display a default dashboard for a Gluster brick present in Tendrl that provides at-a-glance information about a single Gluster brick that includes health and status information, key performance indicators (e.g. IOPS, throughput, etc.), and alerts that can highlight the Tendrl user's (e.g. Gluster Administrator) attention to potential issues in the brick and its underlying disk(s).
Problem description
A Gluster Administrator wants to be able to answer the following questions by looking at the cluster dashboard:
Use Cases
Uses Cases in the form of user stories:
As a Gluster Administrator, I want to view at-a-glance information about my Gluster brick that includes health and status information, key performance indicators (e.g. IOPS, throughput, latency, etc.), and alerts that can highlight my attention to potential issues in the brick and underlying disks.
Look at performance by brick to address diagnosing of RAID 6 disk failure/rebuild/degradation poor performance on one brick
Proposed change
Provide a pre-canned, default brick dashboard in Grafana (that is initially launchable from the Tendrl UI, and eventually embed it into the Tendrl UI) that shows the following metrics rendered either in text or in a chart/graph depending on the type of metric being displayed below:
The Dashboard is composed of individual Panels (dashboard widgits) arranged on a number of Rows.
Note: The cluster, host, and brick should be visible at all times, and user should be able to switch to another host + brick combination.
Row 1
Panel 1: Health
Panel 2: Connections Trend
[FUTURE] Panel 4: Disks
Panel 4: Capacity Utilization
Panel 5: Capacity Available
Panel 6: Growth Rate
Panel 7: Time Remaining (Weeks)
Row 2
Panel 8: IOPS Trend
[FUTURE] Panel 9: IO Size
Panel 10: Inodes Utilization
Panel 11: Inodes Available
Panel 12: LVM thin pool metadata %
Panel 13: LVM thin pool data usage %
Note: The dashboard layout for the panels and panels within the rows may need to alter based on implementation and actual visualization especially when certain metrics may need to be aligned together whether vertically or horizontally.
Alternatives
Create similar dashboard using PatternFly (www.patternfly.org) or d3.js components to show similar information within the Tendrl UI.
Data model impact:
TBD
Impacted Modules:
TBD
Tendrl API impact:
TBD
Notifications/Monitoring impact:
TBD
Tendrl/common impact:
TBD
Tendrl/node_agent impact:
TBD
Sds integration impact:
TBD
Security impact:
TBD
Other end user impact:
User will mostly interact with this feature via the Grafana UI, though access via Grafana API and Tendrl API is possible, but would require API calls to provide similar information.
Performance impact:
TBD
Other deployer impact:
Plug-ins required by Grafana will need to be packaged and installed with tendrl-ansible.
This (default) host dashboard will need to be automatically generated whenever a cluster is imported to be managed by Tendrl.
Developer impact:
TBD
Implementation:
TBD
Assignee(s):
Primary assignee: @cloudbehl
Other contributors: @anmolbabu, @anivargi, @julienlim, @japplewhite
Work Items:
TBD
Estimate:
TBD
Dependencies:
TBD
Testing:
Test whether health, status, and metrics displayed for a given volume is correct and that the information is up-to-date as failures or other changes are observed on a given volume.
Documentation impact:
Documentation should include information related to what's being displayed and explained for clarity if not immediately obvious from looking at the dashboard. This may include but not be limited to what the metrics refers to, the measurement unit, how to use or apply it to solving troubleshooting problems, e.g. healing / split brain issues, lost of quorum, etc.
References and Related GitHub Links: