Dabz / ccloudexporter

Prometheus exporter for Confluent Cloud API metric
https://docs.confluent.io/current/cloud/metrics-api.html
87 stars 53 forks source link

Not able to access more than one cluster using docker-compose option #73

Closed bsunkad closed 3 years ago

bsunkad commented 3 years ago

Hi

Using this code, i tried to connect confluent using the docker-compose method. This code is able to connect and pull the metrics only when I am able to pass CCLOUD_CLUSTER with a single cluster only. Having the following issues and need resolve.

  1. During the single cluster-info through CCLOUD_CLUSTER, not able to get the topics/partitions information.
  2. How do I make the docker-compose method pull more than cluster pieces of information along with topics/partitions etc..

Appreciate if anyone able to help me with this.

Dabz commented 3 years ago

Hi @bsunkad!

In order to do a docker-compose that will fetch the data from multiple CCLOUD_CLUSTER, you will need to provide a custom configuration file. The environment variable allows only one cluster (though it could be improved in the future). Please find bellow an example of docker-compose to provide a custom configuration file:

version: '3.1'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/prometheus.yml
    command:
      - '--config.file=/prometheus.yml'
    ports:
      - 9090:9090
    restart: always

  ccloud_exporter:
    image: dabz/ccloudexporter
    container_name: ccloud_exporter
    command:
      - '--config=/config.yaml'
    environment:
      CCLOUD_API_KEY: ${CCLOUD_API_KEY}
      CCLOUD_API_SECRET: ${CCLOUD_API_SECRET}
      CCLOUD_CLUSTER: ${CCLOUD_CLUSTER}
      CCLOUD_CONNECTOR: ${CCLOUD_CONNECTOR}
      CCLOUD_KSQL: ${CCLOUD_KSQL}
    volumes:
      - ./config.simple.yaml:/config.yaml

You can find examples of configuration files on: https://github.com/Dabz/ccloudexporter/tree/master/config

Let me know if you need further assistance!

bsunkad commented 3 years ago

Thanks, Dabz for the complete step of information.

Now I have a different problem. Prometheus is receiving the metrics, but the default grafana "Confluent Cloud" dashboard is not populated or not visible values in panels. This is the default dashboard that is provided in this repo. upon verifying the variables and metrics, the metrics starts with ccloud_ (eg. ccloud_metric_retained_bytes, ccloud_metric_partition_count etc.).

Only dashboard section Metrics API latency and Scrape duration are populated with values.

Am I missing anything? if not how to fix it to work with the default dashboard to be populated with data. Appreciate your help in advance.

Dabz commented 3 years ago

If only the Metrics API latency is populated, it probably means that the queries are not succeeding on the Metrics API. Can you check the logs of ccloudexporter? It should give you a good idea of what the problem is.

bsunkad commented 3 years ago

Hi Dabz, Based on the logs I am seeing the following errors in ccloudexporter container. { "Endpoint": "https://api.telemetry.confluent.cloud//v2/metrics/cloud/query", "StatusCode": 403, "body": "{\"errors\":[{\"status\":\"403\",\"detail\":\"Query must filter by at least one of your authorized resources\"}]}", "level": "error", "msg": "Received invalid response", "time": "2021-04-29T12:15:58Z" } { "error": "Received status code 403 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ({\"errors\":[{\"status\":\"403\",\"detail\":\"Query must filter by at least one of your authorized resources\"}]})", "level": "error", "msg": "Query did not succeed", "optimizedQuery": { "aggregations": [ { "agg": "SUM", "metric": "io.confluent.kafka.server/sent_records" } ], "filter": { "op": "AND", "filters": [ { "op": "OR", "filters": [ { "field": "resource.kafka.id", "op": "EQ", "value": "STG-EASTUS2-CLUS-0" }, { "field": "resource.kafka.id", "op": "EQ", "value": "PRD-EASTUS2-CLUS-0" }, { "field": "resource.kafka.id", "op": "EQ", "value": "dev-eastus2-clus-1"

Dabz commented 3 years ago

{ "field": "resource.kafka.id", "op": "EQ", "value": "STG-EASTUS2-CLUS-0" } This is not a proper cluster_id. Cluster ID should looked like something lkc-xxxx

bsunkad commented 3 years ago

Hi Dabz, Excellent observation and you made my day. Thank you. It was a little confusing to see the cluster name. In confluent.io if go in deep information of cluster settings I could see the cluster id with lkc-xxxx. Now dashboard is populated with most of the panels. I will observe and work on fixing the remaining panels.

Best regards, Bhaskar .S

Dabz commented 3 years ago

I am glad that this issue have been fixed ;) Have a good day!