Closed anmolbabu closed 7 years ago
@Tendrl/tendrl-core Please provide your suggestions
@anmolbabu Wouldn't the disadvantage in the second approach apply to the first as well? The real downside of the second approach would be the lack of ability to fetch different data sets at different intervals.
In any case, I don't think it would be a bad idea to implement unified API calls, per object, that return all the monitoring data available for a specific cluster, host or a cluster object. Does this sound feasible?
@brainfunked The disadvantage of the 2nd approach doesn't apply to the first because what the 2nd approach is doing is only aggregating the data that's already there in etcd whether or not the monitoring stack is present. The only thing missing without the monitoring stack and in case of approach 1 is the time series data (overall utilization trending graph) rest everything is in etcd and only needs to be aggregated. I see the difference b/w 2 appraoches as the share of responsibility b/w UI and monitoring. So yes approach 2 is feasible but only thing is if monitoring stack is not present approach 2 effectively either becomes approach 1 or the dashboard is blank. So, please suggest which module implements this unified API whether it would be performance-monitoring as in approach 2 or tendrl-api (which I think might be the best place.... as it would then serve both cases whether monitoring is enabled or not)
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
This card contains the following details
Details This card contains no. of hosts, no. of hosts that are down, and no.of alerts on this node.
Details This provides the throughputs of cluster/storage network and replication network and client access heartbeat network. Need more info on this and then we can evaluate the sources of information for this. Note: There is a slight mismatch of this as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/213318455 and its details as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/214068233
Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
This card contains the following details
Details This card contains no. of hosts, no. of hosts that are down, and no.of alerts on this node.
Details This provides the throughputs of cluster/storage network and replication network and client access heartbeat network. Need more info on this and then we can evaluate the sources of information for this. Note: There is a slight mismatch of this as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/213318455 and its details as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/214068233
Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api
@brainfunked @Tendrl/tendrl-core Please provide your inputs/suggestions on this
There is a problem that for configuring the collectd plugins, if we decide to configure the plugins on all nodes, it is an overkill and in ceph's case the commands for getting stats need to be executed only on mons. So, an ideal approach would be to select a node from the group of ideal nodes(mons in ceph's case and all nodes in case of gluster cluster) so that instead of all/some nodes pushing same data to time series db(graphite), it will end up being one node making update to graphite. But the problem here is what happens if the node that is currently configured goes down how we configure some other node in such a case...
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
This card contains the following details
Details This card contains the following:
Details This card contains the following
Details This card contains the following:
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
Details This card contains the following:
This card contains the following details
Details This card contains the following
Details This provides the throughputs of cluster/storage network and replication network and client access heartbeat network. Need more info on this and then we can evaluate the sources of information for this. Note: There is a slight mismatch of this as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/213318455 and its details as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/214068233
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
This card contains the following details:
This card contains:
Details This provides the throughputs of cluster network and public network. Also these are not maintained as part of backend currently.. Need more info on this and then we can evaluate the sources of information for this. Note: There is a slight mismatch of this as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/213318455 and its details as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/214068233
Data-Points with their currently available sources or work that needs to be done to implement it are as under:
This card contains the following details:
This card contains:
Details This provides the throughputs of cluster network and public network. Also these are not maintained as part of backend currently.. Need more info on this and then we can evaluate the sources of information for this. Note: There is a slight mismatch of this as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/213318455 and its details as in https://redhat.invisionapp.com/share/589XIRJBW#/screens/214068233
Apart from the points mentioned under "Work Involved", all these data then need to be exposed via api
Mock-up links:
Gluster dashboard Ceph dashboard
Observations:
Missing data-points:
1 Ceph dashboard:
2 Gluster dashboard:
Approach to be taken for fetching data points in dashboard
1. Each card makes a separate api call to fetch its specific information.
For ex: For the counters, UI needs to invoke respective listing apis and parse the list and form the counters likewise for utilizations, UI needs to fetch utilizations from listing apis responses and total them to get overall utilization and for getting top 5 most used entities(pools, rbd in ceph dashboard and File Shares in gluster dashboard) the listing api needs to support sorting on response.
Advantage:
The different cards in UI can then have independent refresh intervals refreshing the less frequently changing data less frequently and so on...
2. performance-monitoring exposes a single point api that provides all dashboard specific data(only point in time stats and counters) in one query.
Advantage:
UI then needs to make only 2 queries one for utilization time series data and other the performance-monitoring exposed api which provides everything else.
Disadvantage:
Monitoring is an optional stack and if not installed dashboard is either blank or needs to have its own way of fetching whatever is possible without monitoring which means the approach 1 above still needs to be implemented.