Document strategy for gluster-prometheus containers in kube/openshift

JohnStrunk commented 6 years ago

Problem:

There are a number of isolated discussions regarding how this code will be used in kube:

3 - general how to deploy this repo
17 - how to deploy
22 - What to do about cluster-wide metrics vs node-level metrics

Aside from a request for general how-to documentation, there is a need to figure out how to provide both node-level and cluster-level metrics. Providing cluster-level metrics from all nodes (pods) will lead to duplication.

Discussion

The configuration seems to be going the way of being able to enable/disable various metrics per instance (#24). This seems like a reasonable way to fix the above issue. It would entail running a per-node collector for node-level metrics and a single Deployment per cluster for cluster-level metrics.

An alternative would be to have the node-level collectors participate in leader-election, with the result being a single instance that exports cluster-level metrics in addition to its node-level metrics. This strikes me as considerably more complicated than the static approach, above.

Request

Choose one of the above (or another) approaches and document it as the plan for eventual deployment. It need not be implemented immediately, but the choice here affects other projects, namely https://github.com/gluster/anthill.

pranithk commented 6 years ago

@JohnStrunk We are going ahead with leader-election based approach and the leader is chosen based on the approach discussed at https://github.com/gluster/glusterd2/issues/1050

JohnStrunk commented 6 years ago

Excellent. Can you please put together a PR that documents how to deploy gluster-prometheus such that it will interact properly w/ GD2's leader election? It is still unclear to me what the proper configuration settings for gluster-ansible are such that this will automatically work.

pranithk commented 6 years ago

@JohnStrunk Definitely. There is one missing piece before we figure out the specifics for deployment. Whether this needs to support GD1 or not. If support for GD1 is not needed then we think it is better to put the metrics implementation in GD2 itself. Otherwise it needs to be in a separate container. I will send out a PR as soon as that decision is made (which we are hoping will be done by next Tuesday).

simu commented 5 years ago

Hi!

We're looking at deploying gluster-prometheus on OpenShift clusters which are running containerized GD1. Our current plan is to deploy gluster-prometheus in a DaemonSet with the same node selectors as the GD1 DaemonSet. However, deploying gluster-prometheus in its own DaemonSet would require support for gluster --remote-host=<GD1 host> ... in gluster-exporter's GD1 backend.

Additionally, looking at the bundled metrics implementations, some of these metrics do not use the gluster CLI binary to extract information from GD1 and thus would not work out of the box for the approach we're currently pursuing.

So I was wondering if there is a follow up to this issue (and related discussions scattered in #17, #45, #46, and #48) with further discussion or plans describing the official stance on support for containerized GD1 and/or the official strategy for deploying gluster-prometheus with containerized Gluster?

gluster / gluster-prometheus