canonical / grafana-agent-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 8 forks source link

Error in config file. #65

Closed Abuelodelanada closed 3 months ago

Abuelodelanada commented 4 months ago

Bug Description

The metrics section of the config file is wrong.

There are 2 jobs but only one should be there

metrics:
  configs:
  - name: agent_scraper
    remote_write: []
    scrape_configs:
    - job_name: config-server_0_default
      static_configs:
      - labels:
          cluster: config-server
          juju_application: config-server
          juju_model: mia
          juju_model_uuid: 8964d0a2-9951-4ece-82be-e14f4d57a512
          replication_set: config-server
    - job_name: config-server_1_default
      metrics_path: /metrics
      static_configs:
      - labels:
          juju_application: config-server
          juju_model: mia
          juju_model_uuid: 8964d0a2-9951-4ece-82be-e14f4d57a512
        targets:
        - localhost:9216

To Reproduce

  1. Deploy COS-Lite in a k8s controller: juju deploy cos-lite --channel=edge --trust --overlay ./offers-overlay.yaml
  2. git clone https://github.com/canonical/mongodb-operator.git
  3. git switch add-labels
  4. charmcraft pack
  5. juju add-model mia
  6. juju deploy ./*charm --config role="config-server" config-server
  7. juju deploy ./*charm --config role="shard" shard-one -n2
  8. juju deploy ./*charm --config role="shard" shard-two
  9. juju deploy grafana-agent
  10. juju integrate shard-one:sharding config-server:config-server
  11. juju integrate shard-two:sharding config-server:config-server
  12. juju integrate grafana-agent config-server
  13. juju consume k8s-controller:admin/cos.grafana-dashboards
  14. juju consume k8s-controller:admin/cos.loki-logging
  15. juju consume k8s-controller:admin/cos.prometheus-receive-remote-write
  16. juju integrate grafana-agent prometheus-receive-remote-write
  17. juju integrate grafana-agent loki-logging
  18. juju integrate grafana-agent grafana-dashboards
  19. Check that there are 2 jobs instead of only 1: juju ssh grafana-agent/0 cat /etc/grafana-agent.yaml

Environment

Relevant log output

.

Additional context

Mongo DB deployed:

image

If we check for mongodb_up metric in Prometheus we'll get:

image

Note that cluster and replication_set labels are not there and should be.

This issue was found thanks to @MiaAltieri ream more here: https://matrix.to/#/!nHXpRkcSNJHlHdUGbQ:ubuntu.com/$bIiKmfWl8LRE9dAoSorK3W0mHeq1E9D7yD3s6-vcsw8?via=ubuntu.com&via=matrix.org

IbraAoad commented 3 months ago

Hey @Abuelodelanada The current behavior in cos agent is that it creates a separate job for each entry in _metrics_endpoints and _scrape_configs and binds them with the labels from juju topology, in the issue listed above it seems that CosAgentProvider is being instantiated with both endpoints and scrape_configs which I think is the reason why we're getting 2 separate jobs, I think a solution to this would be declaring the full job along with it's labels in scrape_configs and dropping the endpoints param, an example to this can be found in the postgres-operator here

On a side note, I had a chat today about this with @dstathis and was thinking maybe we can open up an enhancement issue for setting additional global labels through CosAgent, what do you think?

IbraAoad commented 3 months ago

@Abuelodelanada I've tried out with the below code in the mongo-operator

        # relation events for Prometheus metrics are handled in the MetricsEndpointProvider
        self._grafana_agent = COSAgentProvider(
            self,
            metrics_rules_dir=Config.Monitoring.METRICS_RULES_DIR,
            logs_rules_dir=Config.Monitoring.LOGS_RULES_DIR,
            log_slots=Config.Monitoring.LOG_SLOTS,
            scrape_configs= self._mongo_scrape_config
        )

    def _mongo_scrape_config(self) -> List[Dict]:
        """Generates scrape config for the mongo metrics endpoint."""
        return [{
            "metrics_path": "/metrics",
            "static_configs": [{
                "targets": [f"{self._unit_ip(self.unit)}:{Config.Monitoring.MONGODB_EXPORTER_PORT}"],
                "labels": {"cluster": "config-server", "replication_set": "config-server"}
            }]
        }]

and yielded the below result

metrics:
  configs:
  - name: agent_scraper
    remote_write:
    - tls_config:
        insecure_skip_verify: false
      url: http://192.168.100.69/cos-prometheus-0/api/v1/write
    scrape_configs:
    - job_name: config-server_0_default
      metrics_path: /metrics
      static_configs:
      - labels:
          cluster: config-server
          replication_set: config-server
          juju_application: config-server
          juju_model: mia
          juju_model_uuid: 979456eb-2436-4906-8d4e-8973bb42a4f9
          juju_unit: config-server/2
        targets:
        - 10.135.135.114:9216
Abuelodelanada commented 3 months ago

Hi @MiaAltieri

About the chat we had some weeks ago, please may you try instantiating COSAgentProvider as @IbraAoad mentioned in the comment above?

IbraAoad commented 3 months ago

Closing this one as /etc/grafana-agent.yaml is being generated correctly with the proper instantiation

MiaAltieri commented 2 months ago

Thanks for your patience, I followed up with Ibrahim Awwad from COS and confirmed this solution works. Thank you for your work here :)