Tendrl / specifications

Tendrl specs go here
GNU Lesser General Public License v3.0
6 stars 16 forks source link

Dashboard Spec - Brick Dashboard #230

Closed julienlim closed 6 years ago

julienlim commented 7 years ago

Dashboard Spec - Brick Dashboard

Display a default dashboard for a Gluster brick present in Tendrl that provides at-a-glance information about a single Gluster brick that includes health and status information, key performance indicators (e.g. IOPS, throughput, etc.), and alerts that can highlight the Tendrl user's (e.g. Gluster Administrator) attention to potential issues in the brick and its underlying disk(s).

Problem description

A Gluster Administrator wants to be able to answer the following questions by looking at the cluster dashboard:

Use Cases

Uses Cases in the form of user stories:

Proposed change

Provide a pre-canned, default brick dashboard in Grafana (that is initially launchable from the Tendrl UI, and eventually embed it into the Tendrl UI) that shows the following metrics rendered either in text or in a chart/graph depending on the type of metric being displayed below:

The Dashboard is composed of individual Panels (dashboard widgits) arranged on a number of Rows.

Note: The cluster, host, and brick should be visible at all times, and user should be able to switch to another host + brick combination.

Row 1

Panel 1: Health

Panel 2: Connections Trend

[FUTURE] Panel 4: Disks

Panel 4: Capacity Utilization

Panel 5: Capacity Available

Panel 6: Growth Rate

Panel 7: Time Remaining (Weeks)

Row 2

Panel 8: IOPS Trend

[FUTURE] Panel 9: IO Size

Panel 10: Inodes Utilization

Panel 11: Inodes Available

Panel 12: LVM thin pool metadata %

Panel 13: LVM thin pool data usage %

Note: The dashboard layout for the panels and panels within the rows may need to alter based on implementation and actual visualization especially when certain metrics may need to be aligned together whether vertically or horizontally.

Alternatives

Create similar dashboard using PatternFly (www.patternfly.org) or d3.js components to show similar information within the Tendrl UI.

Data model impact:

TBD

Impacted Modules:

TBD

Tendrl API impact:

TBD

Notifications/Monitoring impact:

TBD

Tendrl/common impact:

TBD

Tendrl/node_agent impact:

TBD

Sds integration impact:

TBD

Security impact:

TBD

Other end user impact:

User will mostly interact with this feature via the Grafana UI, though access via Grafana API and Tendrl API is possible, but would require API calls to provide similar information.

Performance impact:

TBD

Other deployer impact:

Developer impact:

TBD

Implementation:

TBD

Assignee(s):

Primary assignee: @cloudbehl

Other contributors: @anmolbabu, @anivargi, @julienlim, @japplewhite

Work Items:

TBD

Estimate:

TBD

Dependencies:

TBD

Testing:

Test whether health, status, and metrics displayed for a given volume is correct and that the information is up-to-date as failures or other changes are observed on a given volume.

Documentation impact:

Documentation should include information related to what's being displayed and explained for clarity if not immediately obvious from looking at the dashboard. This may include but not be limited to what the metrics refers to, the measurement unit, how to use or apply it to solving troubleshooting problems, e.g. healing / split brain issues, lost of quorum, etc.

References and Related GitHub Links:

julienlim commented 7 years ago

@sankarshanmukhopadhyay @brainfunked @r0h4n @nthomas-redhat @Tendrl/qe @Tendrl/tendrl_frontend @japplewhite @rghatvis@redhat.com @mcarrano

This dashboard proposal is ready for review. Note: API impact, module impact, etc. has to be filled out by someone else -- maybe @cloudbehl, @anmolbabu, or @anivargi.

Suggested Labels (for folks who have permissions to label the spec):

nthomas-redhat commented 7 years ago

Row 1 Panel 4: Disks No platform support for disk status as such. This won't be supported now

Panel 6: Growth Rate Panel 7: Time Remaining (Weeks) Does this really make any sense to display at the brick level? MVP just talks about the projections at volume level only.

Row 2 Panel 9: IO Size Not MVP

julienlim commented 7 years ago

@nthomas-redhat I've updated and marked Panel 4 (disks) and Panel 9 (IO size) as FUTURE.

For Panel 6 & 7 (Growth Rate and Time Remaining), it's a valid question whether or not it makes sense to project this for a brick, and I would say yes. Per chatting with Alok, there are several users who have single bricks on a single node within the Gluster cluster. While you can see this by host and volume, seeing it by brick can also be valuable (though admittedly redundant if you're doing it by host, except with the host, it includes the boot disk and other disks not used for Gluster volumes).

In addition, if we calculate growth and time remaining easily for the others, replicating here is trivial. So I think it should remain in the dashboard.

julienlim commented 7 years ago

@sankarshanmukhopadhyay @brainfunked @r0h4n @nthomas-redhat @Tendrl/qe @Tendrl/tendrl_frontend @japplewhite @rghatvis@redhat.com @mcarrano

Here's a rough mockup of the proposed brick dashboard:

grafana dashboard - brick

julienlim commented 6 years ago

Noting some additional panels added -- see Bricks dashboard: Unclear what "Utilization" panel is showing.

r0h4n commented 6 years ago

Closing this one, please open new issue with relevant context if anything is missing