Closed benjaminguttmann-avtq closed 7 years ago
Hi,
1.9 should work fine I think. Check the logs of all the exporters and cf_exporter in particular (/var/vcap/sys/log/_exporter/ on the Prometheus VM).
Hi @mkuratczyk ,
I changed the log_level to debug, afterwards I can see the following error messages:
time="2017-08-17T08:37:59Z" level=error msg="Error while listing services: Error requesting services: Unable to decode body: invalid ch aracter '<' looking for beginning of value" source="services_collector.go:142"
time="2017-08-17T13:23:59Z" level=error msg="Error while listing security groups: Error requesting sec groups: Unable to decode body: json: cannot unmarshal number into Go value of type cfclient.CloudFoundryError" source="security_groups_collector.go:142"
Any idea what is wrong ? Appreciate your help.
This looks like a bug in the cf_exporter's Security Groups Collector (https://github.com/cloudfoundry-community/cf_exporter/blob/master/collectors/security_groups_collector.go).
@frodenas have you seen something like this before?
@benjaminguttmann-avtq What bosh release version are you using? cf_exporter
had a bug when parsing security groups, but that was fixed at v17.6.0.
@frodenas According to the Pivotal Release Notes I am currently using the Bosh Director Version 260.6 and Prometheus Version v18.0.0.
@benjaminguttmann-avtq thanks, there should be a security group with some special configuration that the go-cfclient
library is unable to parse. Can you please send me the the rules of your security groups so I can reproduce it on my environment? If you don't want to paste them here, you can send it via email to my gmail address at frodenas
. Thanks.
This is the output of cf curl /v2/security_groups?inline-relations-depth=1
{ "total_results": 1, "total_pages": 1, "prev_url": null, "next_url": null, "resources": [ { "metadata": { "guid": "c7e2d7f1-234b-44bb-9e83-fd2372ea0379", "url": "/v2/security_groups/c7e2d7f1-234b-44bb-9e83-fd2372ea0379", "created_at": "2017-05-18T17:01:54Z", "updated_at": "2017-05-18T17:01:54Z" }, "entity": { "name": "default_security_group", "rules": [ { "protocol": "all", "destination": "0.0.0.0-169.253.255.255" }, { "protocol": "all", "destination": "169.255.0.0-255.255.255.255" } ], "running_default": true, "staging_default": true, "spaces_url": "/v2/security_groups/c7e2d7f1-234b-44bb-9e83-fd2372ea0379/spaces", "spaces": [] } } ] }
this is the security group
[
{
"destination": "0.0.0.0-169.253.255.255",
"protocol": "all"
},
{
"destination": "169.255.0.0-255.255.255.255",
"protocol": "all"
}
]
It is just the default security group. When I am using the go-cfclient or the cf_exporter alone, from my local machine there aren't any errors ... or better I can't see any for the cf_exporter even with log_level debug it is only showing the
INFO[0000] Starting cf_exporter (version=, branch=, revision=) source="cf_exporter.go:189"
INFO[0000] Build context (go=go1.8.3, user=, date=) source="cf_exporter.go:190"
INFO[0000] Listening on :9193 source="cf_exporter.go:278"
message. Not even sure if it is doing anything :-/
The cf_exporter
only queries the cc
when you ask for metrics, so at another terminal window you should run a curl localhost:9193/metrics
. Please run that command and see if there's any error message.
@frodenas : Thanks for that explanation, is was not aware of this. I ran the command but there was not error message :-/ Any other suggestions what to check?
@benjaminguttmann-avtq I'm a bit confused now, let's start over. If you go to the prometheus query console and you query for the cf_exporter_application_info
metric, does it show you anything?
@frodenas : Okay. I am not quite sure why we got this error in the cf_explorer but it seems to work more or less. I can see the information in cf: summary dashboard for example.
But when I check the Apps: summary dashboard I can see that the following query is used in grafana:
avg(firehose_container_metric_cpu_percentage{environment=~\"$environment\",bosh_deployment=~\"$bosh_deployment\",application_id=~\"$cf_application_id\"})"
I entered the query
avg(firehose_container_metric_cpu_percentage{environment="PCF19",bosh_deployment="cf",application_id="72397fa7-abeb-41df-83a6-2be7a495108b"})
into the prometheus query console and got the following result:
{} | 0.41178078827797354
but the value is not shown in the Grafana Dashboard. There it says: no data points
Okay, we figured it out. It seems like there is a problem when the deployment name of the cf_exporter is not equal to 'cf'. As an example the dashboard 'Apps: System' is using bosh_deployment as a variable. In our case we renamed cf_exporter deployment_name to "pcf" which breaks this functionality as it always expects to be "cf". After hard coding the bosh_deployment in the query to cf it worked but the dashboard dropdown stopped working as soon as we changed the Grafana templating variable 'bosh_deployment' to use the job instead of the deployment. So we assume that we need both this variables here or we've to fix deployment_name to be "cf" always.
I think there are two possible solutions: 1 The deployment name in the cf exporter should be fixed to 'cf' 2 Add a new variable to templating section of the cf dashboards. and edit the firehose queries of the dashboard. What do you think is the best way to go here? Maybe it is also possible to make the deployment of the firehose editable. If it is okay for you, I will provide a PR for the preferred solution.
ok, glad to hear you discovered the issue.
The deployment name should NOT be hardcoded to cf
, because it's already configurable at the metron agent. In fact, there're users that tune this name at their deployments, and if you look at cf-deployment you'll see that it's using the system domain. PCF OTOH hardcodes this tocf
.
The cf_exporter.cf.deployment_name property at the prometheus deployment MUST match the metron_agent.deployment property of your CF deployment. I'll update the monitor-cf.yml operator file to use the system_domain
var instead of hardcoding it to cf
.
Fixed this at https://github.com/cloudfoundry-community/prometheus-boshrelease/commit/0bd45120b02d46f02e2fe93825423eb23ca5d3ef.
Please reopen if you still have any questions.
Hi there,
I think this is more a general question, I am currently facing the issue that the app dashboards are not working for me. I am testing Prometheus at the moment with an PCF 1.9 installation. Maybe a certain version of the CF release is needed to gather the needed information for the dashboards?
Because if I search Prometheus for the metrics used for the app dashboards it says either 'no data' or i could not find the metric at all.
Thanks for clarification.