Tendrl / specifications

Tendrl specs go here
GNU Lesser General Public License v3.0
6 stars 16 forks source link

Monitoring: Dashboard data-points #145

Closed anmolbabu closed 7 years ago

anmolbabu commented 7 years ago

Mock-up links:

Gluster dashboard Ceph dashboard

Observations:

Missing data-points:

1 Ceph dashboard:

2 Gluster dashboard:

Approach to be taken for fetching data points in dashboard

1. Each card makes a separate api call to fetch its specific information.

For ex: For the counters, UI needs to invoke respective listing apis and parse the list and form the counters likewise for utilizations, UI needs to fetch utilizations from listing apis responses and total them to get overall utilization and for getting top 5 most used entities(pools, rbd in ceph dashboard and File Shares in gluster dashboard) the listing api needs to support sorting on response.

Advantage:

The different cards in UI can then have independent refresh intervals refreshing the less frequently changing data less frequently and so on...

2. performance-monitoring exposes a single point api that provides all dashboard specific data(only point in time stats and counters) in one query.

Advantage:

UI then needs to make only 2 queries one for utilization time series data and other the performance-monitoring exposed api which provides everything else.

Disadvantage:

Monitoring is an optional stack and if not installed dashboard is either blank or needs to have its own way of fetching whatever is possible without monitoring which means the approach 1 above still needs to be implemented.

anmolbabu commented 7 years ago

@Tendrl/tendrl-core Please provide your suggestions

brainfunked commented 7 years ago

@anmolbabu Wouldn't the disadvantage in the second approach apply to the first as well? The real downside of the second approach would be the lack of ability to fetch different data sets at different intervals.

In any case, I don't think it would be a bad idea to implement unified API calls, per object, that return all the monitoring data available for a specific cluster, host or a cluster object. Does this sound feasible?

anmolbabu commented 7 years ago

@brainfunked The disadvantage of the 2nd approach doesn't apply to the first because what the 2nd approach is doing is only aggregating the data that's already there in etcd whether or not the monitoring stack is present. The only thing missing without the monitoring stack and in case of approach 1 is the time series data (overall utilization trending graph) rest everything is in etcd and only needs to be aggregated. I see the difference b/w 2 appraoches as the share of responsibility b/w UI and monitoring. So yes approach 2 is feasible but only thing is if monitoring stack is not present approach 2 effectively either becomes approach 1 or the dashboard is blank. So, please suggest which module implements this unified API whether it would be performance-monitoring as in approach 2 or tendrl-api (which I think might be the best place.... as it would then serve both cases whether monitoring is enabled or not)

anmolbabu commented 7 years ago

Ceph Cluster Dashboard data-points:

Mock up link:

Ceph cluster dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

Note:

Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api

anmolbabu commented 7 years ago

Gluster Cluster Dashboard data-points:

Mock up link:

Gluster cluster dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

Note:

Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api

anmolbabu commented 7 years ago

@brainfunked @Tendrl/tendrl-core Please provide your inputs/suggestions on this

anmolbabu commented 7 years ago

There is a problem that for configuring the collectd plugins, if we decide to configure the plugins on all nodes, it is an overkill and in ceph's case the commands for getting stats need to be executed only on mons. So, an ideal approach would be to select a node from the group of ideal nodes(mons in ceph's case and all nodes in case of gluster cluster) so that instead of all/some nodes pushing same data to time series db(graphite), it will end up being one node making update to graphite. But the problem here is what happens if the node that is currently configured goes down how we configure some other node in such a case...

anmolbabu commented 7 years ago

CEPH MAIN DASHBOARD Data-points

Mock up link:

Ceph main dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

anmolbabu commented 7 years ago

GLUSTER MAIN DASHBOARD Data-points

Mock up link:

Gluster main dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

anmolbabu commented 7 years ago

Gluster HOST Dashboard data-points:

Mock up link:

Gluster Host dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

anmolbabu commented 7 years ago

Ceph HOST Dashboard data-points:

Mock up link:

Ceph Host dashboard

Data-Points with their currently available sources or work that needs to be done to implement it are as under:

Note:

Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api