cloudfoundry / prometheus-boshrelease

bosh release for prometheus ecosystem
Apache License 2.0
113 stars 162 forks source link

firehose_exporter and cf_exporter not collecting data.. #190

Closed fgraichen closed 6 years ago

fgraichen commented 6 years ago

Boths show as started in the logs but I am not seeing any data flow through to Grafana.

The stdout logs remain at zero length and I am seeing no errors in the other logs..just the start messages and the "listening" message on the firehose.

To be honest I am not clear what grafana dashboards are fed via the firehose versus cf_exporter.

Thanks in advance

benjaminguttmann-avtq commented 6 years ago

Hi @fgraichen , the default log level is not that helpful. To see more details you have to increase the log level to debug. More or less all dashboards not starting with Bosh but with CF and Apps are fed via the firehose and cf_exporter. Am I right in the assumption that your bosh exporter is working as expected? Did you use the provided ops files of the prometheus repository to set up prometheus or are you using some own manifest? You can try to curl the metrics endpoint of the exporters and see if you receive some error on that. You can find the targets in the Prometheus UI > Status > Targets.

fgraichen commented 6 years ago

You are correct. BOSH is working fine. How do I increase the log levels ? If I understand correctly the cf exporter is hitting the standard cf apis to get information. What causes that to “fire”?? I guess same question for the firehose. When I look at the monitor_cf yml it has an entry for metron deployment but I don’t see a metron vm running except on an older cf deployment running pcf metrics. Is this the metron that comes with pcf metrics? Is this the same for 2.1 instances of pcf ? Do I neeed to be running pcf metrics?

fgraichen commented 6 years ago

I am not seeing the firehose nor the cf_exporters defined as targets when I log into the promethues console. I just see the node and the bosh exporters.. Is it possible that since these were originally configured incorrectly that that prvented them from being set in prometheus?

fgraichen commented 6 years ago

I hit the cf_exporter running instance on the prometheus VM with /metrics and it returned metric data so it just seems for some reason the prometheus targets are not set

benjaminguttmann-avtq commented 6 years ago

@fgraichen : Okay. Let me answer your questions 1 by 1.

1) Increasing the log level can be done via properties in the manifest. You can find the properties to do so https://github.com/bosh-prometheus/prometheus-boshrelease/blob/8fb0807693a7101c42285fc7e83bac8f8a48b862/jobs/cf_exporter/spec#L29-L30 and https://github.com/bosh-prometheus/prometheus-boshrelease/blob/8fb0807693a7101c42285fc7e83bac8f8a48b862/jobs/firehose_exporter/spec#L39-L40 2) There should be one metron agent on nearly every VM itself. No additional metron VMs. Please make sure to configure the same metron_name in Prometheus as configured in your environment ( for PCF it is hardcoded to CF) 3) The exporters are caused to 'fire' if something tries to 'GET' the /metrics endoint. That is configured via the scrape configuration of the Prometheus. There is also the scrape interval/timeout configured. 4) It sounds like the CF Exporter is not on the same VM as the Prometheus. Is that right? Per default Prometheus uses service discovery to find the different exporters but this only works for exporters placed on the Prometheus VM. If this is not the case for you you have to configure the scraping in your Prometheus by yourself. This can be done here: https://github.com/bosh-prometheus/prometheus-boshrelease/blob/a686e5c30f284fafe3eed17a2975cd6fcb4d5ba1/jobs/prometheus/spec#L41-L42 The configuration should look something like that:

        scrape_configs:
          scrape_interval: 5m
          scrape_timeout: 4m
        - job_name: prometheus
          static_configs:
          - targets:
            - localhost:9090
        - job_name: firehose_exporter
          static_configs:
          - targets:
            - <replace_with_firehose_exporter_vm_ip>:9186
        - job_name: cf_exporter
          static_configs:
          - targets:
            - <replace_with_exporter_exporter_vm_ip>:9193

I hope this will help you.

fgraichen commented 6 years ago

Great feedback! The cf_exporter is on the Prometheus vm and I see it in the same place as the bosh-exporter so I will look to update the scrape config.

fgraichen commented 6 years ago

When I ssh into the prometheus vm and I hack the prometheus.yml file and I use the technique above to add the scrape_configs i then see all of the data from both cf_exporter and firehose.

The problem is that when I run the bosh deployment it creates the following for the scrape_configs. I am not familiar with using json in this config setting so I am not sure how to debug it nor how to override it.

I am not sure what ${1} would resolve to for the host? When I look at the prometheus targets with this configuration I am seeing bosh and node targets only.

A list of scrape configurations.

scrape_configs: [{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"prometheus","relabel_configs":[{"action":"keep","regex":"prometheus","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9090","source_labels":["address"],"target_label":"address"}]},{"job_name":"bosh","scrape_interval":"2m","scrape_timeout":"1m","static_configs":[{"targets":["localhost:9190"]}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"bosh_tsdb","relabel_configs":[{"action":"keep","regex":"bosh_tsdb_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9194","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"cadvisor","relabel_configs":[{"action":"keep","regex":"cadvisor","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:8080","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"cf","relabel_configs":[{"action":"keep","regex":"cf_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9193","source_labels":["address"],"target_label":"address"}],"scrape_interval":"4m","scrape_timeout":"2m"},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"collectd","relabel_configs":[{"action":"keep","regex":"collectd_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9103","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"consul","relabel_configs":[{"action":"keep","regex":"consul_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9107","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"elasticsearch","relabel_configs":[{"action":"keep","regex":"elasticsearch_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9114","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"concourse","relabel_configs":[{"action":"keep","regex":"atc","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","source_labels":["meta_bosh_deployment"],"target_label":"bosh_deployment"},{"regex":"(.)","replacement":"${1}:9391","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"firehose","relabel_configs":[{"action":"keep","regex":"firehose_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9186","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"github","relabel_configs":[{"action":"keep","regex":"github_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9171","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"grafana","relabel_configs":[{"action":"keep","regex":"grafana","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:3000","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"graphite","relabel_configs":[{"action":"keep","regex":"graphite_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9108","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"haproxy","relabel_configs":[{"action":"keep","regex":"haproxy_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9101","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"influxdb","relabel_configs":[{"action":"keep","regex":"influxdb_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9122","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"kubernetes","relabel_configs":[{"action":"keep","regex":"kube_state_metrics_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9188","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"memcached","relabel_configs":[{"action":"keep","regex":"memcached_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9150","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"mongodb","relabel_configs":[{"action":"keep","regex":"mongodb_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9001","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"mysql","relabel_configs":[{"action":"keep","regex":"mysqld_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9104","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"nats","relabel_configs":[{"action":"keep","regex":"nats_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9118","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"node","relabel_configs":[{"action":"keep","regex":"node_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9100","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"postgres","relabel_configs":[{"action":"keep","regex":"postgres_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9187","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"honor_labels":true,"job_name":"pushgateway","relabel_configs":[{"action":"keep","regex":"pushgateway","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9091","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"rabbitmq","relabel_configs":[{"action":"keep","regex":"rabbitmq_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9125","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"redis","relabel_configs":[{"action":"keep","regex":"redis_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9121","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"shield","relabel_configs":[{"action":"keep","regex":"shield_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9179","source_labels":["address"],"target_label":"address"}],"scrape_interval":"4m","scrape_timeout":"2m"},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"stackdriver","relabel_configs":[{"action":"keep","regex":"stackdriver_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9255","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"statsd","relabel_configs":[{"action":"keep","regex":"statsd_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9102","source_labels":["address"],"target_label":"address"}]}]

Alerting specifies settings related to the Alertmanager.

alerting: {}

frodenas commented 6 years ago

Check the contents of /var/vcap/store/bosh_exporter/bosh_target_groups.json at the prometheus vm, if the files does not exists or does not show any info, then the problem might be at the bosh_exporter.

fgraichen commented 6 years ago

I tried to override this in the spec file but was not sure what the syntax should be. Nothing I tried seemed to override it.

prometheus.scrape_configs: description: "Array of scrape configurations"

I also tried to override it in the promethues.yml under the config/templates

A list of scrape configurations.

scrape_configs: <%= p('prometheus.scrape_configs', []).to_json %>

Given that the spec entry does not have anything in it I assume that this is something that prometheus building dynamically ?

frodenas commented 6 years ago

That property is filled when you add the monitor-bosh.yml op-file.

It basically looks for a /var/vcap/store/bosh_exporter/bosh_target_groups.json file, and applies a regexp to dynamically find exporters. Check the contents of that file, are there any exporters?

fgraichen commented 6 years ago

I think the issue is that there are no entries in /var/vcap/store/bosh_exporter/bosh_target_groups.json for either firehose or cf_exporter

frodenas commented 6 years ago

What bosh director are you monitoring? Are there any errors at the /var/vcap/sys/log/bosh_exporter/* logs?

fgraichen commented 6 years ago

There are no errrors in that log and I am getting bosh statistics. Is it possible that because I am doing both monitor bosh AND monitor cf in the same deployment that the way the scrape config a get created that the bosh is wiping out the cf and firehose scrape statements? I have verified that both collect metrics and if I hard code the scrapes config they work as expected .. I will remove monitor bosh this am and see if this theory holds.

frodenas commented 6 years ago

No, all exporters can coexist in the same deployment.

How many BOSH directors do you have? If you have more than 1, are you monitoring the one where you have deployed your exporters?

benjaminguttmann-avtq commented 6 years ago

@frodenas : Okay, I think I still don't get how the service discovery stuff works in detail. If I got three bosh exporters ( that means three directors ). One of these directors deployed the exporters (firehose, cf, bosh exporter all on one VM, + 2 VM with one bosh exporter each). Is it possible to use the service discovery here and am I right that the service discovery prevents the export from creating bosh tasks all the time?

fgraichen commented 6 years ago

Here was the final combination that worked for me. I took the advice here to add the scrape configs from 4 days ago and added them directly to the main prometheus yml. I didn’t put it together right away that that was were the array in the spec statement was getting its values. With these specific scrape statements in the Prometheus.yml and then executing the monitor-bosh.yml I ended up with all of the pieces in one scrape config. Maybe the readme needs this information more clearly spelled out ? Next I want to add the ability to monitor an additional cloud foundry environment from one Prometheus. I know now how to add the prometheus statements to the scrape config but not sure how I would get one Prometheus deployment to do multiple firehose, bosh and cf exporters in one deployment. Even if I have to do that manually at the Prometheus level this is still a very worthwhile Bosh deployment script!

frodenas commented 6 years ago

@benjaminguttmann-avtq we're conflating different issues here.

@fgraichen the actual manifest files in this repo are targered for a single BOSH and CF installation. If you want to monitor multiple BOSHes or CFs, then you will need to create your own manifest files.

fgraichen commented 6 years ago

Tremendous thanks for your help and patience. This is a great bosh deployment. I understand that I was trying to bend it to a different purpose than for what it was originally designed for (and I understand your point of how it would have just worked had I only involved one director).

I now have a great starting point to take this down the path that matches our deployment methodology. You can close this issue,

Thanks again.