Closed fgraichen closed 6 years ago
Hi @fgraichen , the default log level is not that helpful. To see more details you have to increase the log level to debug. More or less all dashboards not starting with Bosh
but with CF
and Apps
are fed via the firehose and cf_exporter. Am I right in the assumption that your bosh exporter is working as expected? Did you use the provided ops files of the prometheus repository to set up prometheus or are you using some own manifest? You can try to curl the metrics endpoint of the exporters and see if you receive some error on that. You can find the targets in the Prometheus UI > Status > Targets.
You are correct. BOSH is working fine. How do I increase the log levels ? If I understand correctly the cf exporter is hitting the standard cf apis to get information. What causes that to “fire”?? I guess same question for the firehose. When I look at the monitor_cf yml it has an entry for metron deployment but I don’t see a metron vm running except on an older cf deployment running pcf metrics. Is this the metron that comes with pcf metrics? Is this the same for 2.1 instances of pcf ? Do I neeed to be running pcf metrics?
I am not seeing the firehose nor the cf_exporters defined as targets when I log into the promethues console. I just see the node and the bosh exporters.. Is it possible that since these were originally configured incorrectly that that prvented them from being set in prometheus?
I hit the cf_exporter running instance on the prometheus VM with /metrics and it returned metric data so it just seems for some reason the prometheus targets are not set
@fgraichen : Okay. Let me answer your questions 1 by 1.
1) Increasing the log level can be done via properties in the manifest. You can find the properties to do so https://github.com/bosh-prometheus/prometheus-boshrelease/blob/8fb0807693a7101c42285fc7e83bac8f8a48b862/jobs/cf_exporter/spec#L29-L30 and https://github.com/bosh-prometheus/prometheus-boshrelease/blob/8fb0807693a7101c42285fc7e83bac8f8a48b862/jobs/firehose_exporter/spec#L39-L40 2) There should be one metron agent on nearly every VM itself. No additional metron VMs. Please make sure to configure the same metron_name in Prometheus as configured in your environment ( for PCF it is hardcoded to CF) 3) The exporters are caused to 'fire' if something tries to 'GET' the /metrics endoint. That is configured via the scrape configuration of the Prometheus. There is also the scrape interval/timeout configured. 4) It sounds like the CF Exporter is not on the same VM as the Prometheus. Is that right? Per default Prometheus uses service discovery to find the different exporters but this only works for exporters placed on the Prometheus VM. If this is not the case for you you have to configure the scraping in your Prometheus by yourself. This can be done here: https://github.com/bosh-prometheus/prometheus-boshrelease/blob/a686e5c30f284fafe3eed17a2975cd6fcb4d5ba1/jobs/prometheus/spec#L41-L42 The configuration should look something like that:
scrape_configs:
scrape_interval: 5m
scrape_timeout: 4m
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: firehose_exporter
static_configs:
- targets:
- <replace_with_firehose_exporter_vm_ip>:9186
- job_name: cf_exporter
static_configs:
- targets:
- <replace_with_exporter_exporter_vm_ip>:9193
I hope this will help you.
Great feedback! The cf_exporter is on the Prometheus vm and I see it in the same place as the bosh-exporter so I will look to update the scrape config.
When I ssh into the prometheus vm and I hack the prometheus.yml file and I use the technique above to add the scrape_configs i then see all of the data from both cf_exporter and firehose.
The problem is that when I run the bosh deployment it creates the following for the scrape_configs. I am not familiar with using json in this config setting so I am not sure how to debug it nor how to override it.
I am not sure what ${1} would resolve to for the host? When I look at the prometheus targets with this configuration I am seeing bosh and node targets only.
scrape_configs: [{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"prometheus","relabel_configs":[{"action":"keep","regex":"prometheus","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9090","source_labels":["address"],"target_label":"address"}]},{"job_name":"bosh","scrape_interval":"2m","scrape_timeout":"1m","static_configs":[{"targets":["localhost:9190"]}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"bosh_tsdb","relabel_configs":[{"action":"keep","regex":"bosh_tsdb_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9194","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"cadvisor","relabel_configs":[{"action":"keep","regex":"cadvisor","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:8080","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"cf","relabel_configs":[{"action":"keep","regex":"cf_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9193","source_labels":["address"],"target_label":"address"}],"scrape_interval":"4m","scrape_timeout":"2m"},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"collectd","relabel_configs":[{"action":"keep","regex":"collectd_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9103","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"consul","relabel_configs":[{"action":"keep","regex":"consul_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9107","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"elasticsearch","relabel_configs":[{"action":"keep","regex":"elasticsearch_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9114","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"concourse","relabel_configs":[{"action":"keep","regex":"atc","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","source_labels":["meta_bosh_deployment"],"target_label":"bosh_deployment"},{"regex":"(.)","replacement":"${1}:9391","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"firehose","relabel_configs":[{"action":"keep","regex":"firehose_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9186","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"github","relabel_configs":[{"action":"keep","regex":"github_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9171","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"grafana","relabel_configs":[{"action":"keep","regex":"grafana","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:3000","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"graphite","relabel_configs":[{"action":"keep","regex":"graphite_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9108","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"haproxy","relabel_configs":[{"action":"keep","regex":"haproxy_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9101","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"influxdb","relabel_configs":[{"action":"keep","regex":"influxdb_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9122","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"kubernetes","relabel_configs":[{"action":"keep","regex":"kube_state_metrics_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9188","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"memcached","relabel_configs":[{"action":"keep","regex":"memcached_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9150","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"mongodb","relabel_configs":[{"action":"keep","regex":"mongodb_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9001","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"mysql","relabel_configs":[{"action":"keep","regex":"mysqld_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9104","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"nats","relabel_configs":[{"action":"keep","regex":"nats_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9118","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"node","relabel_configs":[{"action":"keep","regex":"node_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9100","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"postgres","relabel_configs":[{"action":"keep","regex":"postgres_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9187","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"honor_labels":true,"job_name":"pushgateway","relabel_configs":[{"action":"keep","regex":"pushgateway","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9091","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"rabbitmq","relabel_configs":[{"action":"keep","regex":"rabbitmq_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9125","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"redis","relabel_configs":[{"action":"keep","regex":"redis_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.)","replacement":"${1}:9121","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"shield","relabel_configs":[{"action":"keep","regex":"shield_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9179","source_labels":["address"],"target_label":"address"}],"scrape_interval":"4m","scrape_timeout":"2m"},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"stackdriver","relabel_configs":[{"action":"keep","regex":"stackdriver_exporter","source_labels":["__meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9255","source_labels":["address"],"target_label":"address"}]},{"file_sd_configs":[{"files":["/var/vcap/store/bosh_exporter/bosh_target_groups.json"]}],"job_name":"statsd","relabel_configs":[{"action":"keep","regex":"statsd_exporter","source_labels":["meta_bosh_job_process_name"]},{"regex":"(.*)","replacement":"${1}:9102","source_labels":["address"],"target_label":"address"}]}]
alerting: {}
Check the contents of /var/vcap/store/bosh_exporter/bosh_target_groups.json
at the prometheus vm, if the files does not exists or does not show any info, then the problem might be at the bosh_exporter
.
I tried to override this in the spec file but was not sure what the syntax should be. Nothing I tried seemed to override it.
prometheus.scrape_configs: description: "Array of scrape configurations"
I also tried to override it in the promethues.yml under the config/templates
scrape_configs: <%= p('prometheus.scrape_configs', []).to_json %>
Given that the spec entry does not have anything in it I assume that this is something that prometheus building dynamically ?
That property is filled when you add the monitor-bosh.yml op-file.
It basically looks for a /var/vcap/store/bosh_exporter/bosh_target_groups.json
file, and applies a regexp
to dynamically find exporters. Check the contents of that file, are there any exporters?
I think the issue is that there are no entries in /var/vcap/store/bosh_exporter/bosh_target_groups.json for either firehose or cf_exporter
What bosh director are you monitoring? Are there any errors at the /var/vcap/sys/log/bosh_exporter/*
logs?
There are no errrors in that log and I am getting bosh statistics. Is it possible that because I am doing both monitor bosh AND monitor cf in the same deployment that the way the scrape config a get created that the bosh is wiping out the cf and firehose scrape statements? I have verified that both collect metrics and if I hard code the scrapes config they work as expected .. I will remove monitor bosh this am and see if this theory holds.
No, all exporters can coexist in the same deployment.
How many BOSH directors do you have? If you have more than 1, are you monitoring the one where you have deployed your exporters?
@frodenas : Okay, I think I still don't get how the service discovery stuff works in detail. If I got three bosh exporters ( that means three directors ). One of these directors deployed the exporters (firehose, cf, bosh exporter all on one VM, + 2 VM with one bosh exporter each). Is it possible to use the service discovery here and am I right that the service discovery prevents the export from creating bosh tasks all the time?
Here was the final combination that worked for me. I took the advice here to add the scrape configs from 4 days ago and added them directly to the main prometheus yml. I didn’t put it together right away that that was were the array in the spec statement was getting its values. With these specific scrape statements in the Prometheus.yml and then executing the monitor-bosh.yml I ended up with all of the pieces in one scrape config. Maybe the readme needs this information more clearly spelled out ? Next I want to add the ability to monitor an additional cloud foundry environment from one Prometheus. I know now how to add the prometheus statements to the scrape config but not sure how I would get one Prometheus deployment to do multiple firehose, bosh and cf exporters in one deployment. Even if I have to do that manually at the Prometheus level this is still a very worthwhile Bosh deployment script!
@benjaminguttmann-avtq we're conflating different issues here.
scrape_configs
sectionbosh_exporter
is writing a file with the list of vm's from all deployments that it controls.bosh_exporter
file needs to be located at the prometheus instance so prometheus can read itbosh_exporter
job in the same vmbosh_exporter
is configured to target a bosh director that has NOT deployed cf_exporter
, ..., then the file will NOT contain those exporters. Hence I asked several times "How many BOSH directors do you have?" and "What bosh director are you monitoring?"bosh_exporter
is targetting a bosh director that does NOT control the cf_exporter
and firehose_exporter
, therefore, those targets are not configure dynamically in prometheus.@fgraichen the actual manifest files in this repo are targered for a single BOSH and CF installation. If you want to monitor multiple BOSHes or CFs, then you will need to create your own manifest files.
Tremendous thanks for your help and patience. This is a great bosh deployment. I understand that I was trying to bend it to a different purpose than for what it was originally designed for (and I understand your point of how it would have just worked had I only involved one director).
I now have a great starting point to take this down the path that matches our deployment methodology. You can close this issue,
Thanks again.
Boths show as started in the logs but I am not seeing any data flow through to Grafana.
The stdout logs remain at zero length and I am seeing no errors in the other logs..just the start messages and the "listening" message on the firehose.
To be honest I am not clear what grafana dashboards are fed via the firehose versus cf_exporter.
Thanks in advance