Closed cfmagnum closed 6 years ago
@cfmagnum At the prometheus dashboard, is there anycf_*
metric? If there aren't metrics, check the cf_exporter
logs. I see at the attached file that it was unable to contact the cf api, and that you have restarted it, are there any other error messages after the restart?
Hi @frodenas Thanks you for prompt response... To Ans your question: yes we do have cf_exporter * matric on Prometheus dashboard. And it is generating the logs as expected. Also there is currently api fluctuation issue after we upgraded the cf from v269 to v271. We have investigated the reason and found that when cf_exporter process is stooped, api is stable and running.
Can there be issue related to cf api or other things? Dashboard is still not giving any logs as we restarted the process.
Attaching the latest logs which are generated. new.txt
*there is no other specific logs apart from the attached one in cf_exporter.
w/r/t the "api fluctuation" I recommend you upgrading the cf_exporter
version (via a newer prometheus bosh release). The version that you are using puts lots of stress on the cf api, as it tries to get all app events. The latest version doesn't query the app events anymore, so should be safer to use.
Will try upgrading the version to the latest as we hold older version of cf_exporter. Thanks you once again @frodenas for your views.
@cfmagnum I forgot to tell that before upgrading the cf_exporter
you can try to disable the ApplicationEvents
by setting the cf_exporter.filter.collectors
property to something like: "Applications,Organizations,SecurityGroups,Services,Spaces". Let's see if disabling this collector, the cf api behaves correctly.
Hi @frodenas After disabling the collector there is no impact on api fluctuation. It is acting the same with lots of load. Finally we decided to upgrade the prometheus to 18.0.x version. Now since we are using 13.0.0 on our environment we are taking step by step forward. While coming to version 16.0.0 we are facing the authentication issue. We did all the require changes as suggested in the release. We deployed version 16.0.0 and it was successful, but when we try to access the grafana dashboard it asking for Authentication. After inserting correct user ID and Password it keeps on asking the same. PFA below: It also says about Site is not public. Can you please let us know what are we missing. Is there any certs part which is missed. If yes do we have to change that in CF or Current deployment of Prometheus.
Thanks, Subhash
Can you please paste your sanitized deployment manifest?
Hi @frodenas, PFA Prometheus v16.0.txt
SSH into your grafana
vm and locate the /var/vcap/jobs/grafana/bin/prometheus-datasource
file. Inside, you'll find that it's setting thePROMETHEUS_URL
env var. It should point to your prometheus.web.external_url
property value. Check the port, at your deployment manifest I see 9093
and it should be 9090
. Check the IP, it should point to your ngnix
vm ip address.
Hi @frodenas auth issue got resolved but we are unable to see any data on Grafana Dashboard. We have now upgraded Prometheus to v 17.0.0 from 16.0.0, and as per your previous threads i.e.
The grafana.promethus.dashboard_files property should point to /var/vcap/jobs/... not /var/vcap/packages The grafana.server.root_url should point to the nginx vip and port 3000 The prometheus.server.root_url should point to the nginx vip and port 9090
are taken care. Now we have same issue which we had in version 13.0.0, unable to see data on grafana dashboard. We checked prometheus dashboard a picked one metrics "firehose_container_metric_cpu_percentage" and we can view its logs on prometheus dashboard, but same thing when i try to go for Apps request data is not visible. PFA of both the case
Also prometheus target is checked and it is showing as below: *Second IP for CF is not in use. Please let me know what i am missing here. Anything major?
Thanks Subhash
At the prometheus dashboard, do you see any cf_application_info
metric? If yes, can you please paste an screenshot?
Also check that your cf_exporter.metrics.environment
property matches firehose_exporter.metrics.environment
and cf_exporter.cf.deployment_name
matches your metron_agent.deployment
(from your cf deployment, at the screenshot is aws-clients-acf-devtest-cf
).
Prometheus dashboard cf_application_info snapshot:
Your firehose
& cf
metrics screenshots don't show the environment
label, that means you're using an old prometheus release version. Have you upgraded the cf_exporter
& firehose_exporter
to use the same release as the prometheus deployment? Otherwise, is not going to work.
So do you mean to say i want to upgrade Prometheus which i am already doing to 17.x.x for now. Other than this cf_exporter and firehose_exporter needs upgrade via upgrading cf?
The deployment manifest that you attached does NOT deploy the cf_exporter
and firehose_exporter
. So I guess you are deploying it on another deployment (maybe cf
). Release versions must match on both deployments.
Hi Ferdy, I work with cfmagnum, Yes cf-exporter and firehose_exporter sits within cf deployment. Both CF and Prometheus deployment uses same version of prometheus release (17.0.0).
After matching versions in both deployments, App: System
dashboard working very well. but Latency
and Request
dashboards are still producing same error, No data points
.
Just for curiosity went through Panel Json for Latency and Request Dashboards, found that metrics use in those json are not available on Prometheus. Can this be a the root cause of error?
Metric On grafana json:
firehose_http_start_stop_client_request_duration_seconds_sum
firehose_http_start_stop_client_request_duration_seconds
On prometheus, firehose_http_start_stop_cached
only this.
Those metrics are only generated by the firehose when there's traffic. Can you please generate some traffic on an application and then checking if that metric appears at prometheus?
Also, check that your firehose_exporter.filter.events
property at your prometheus deployment manifest is NOT set, or if it is set, then it includes the HttpStartStop
collector.
Quick update, All app dashboards are working, just curl the app which was giving no data point
error ^_^ . Newly deployed app data populated correctly I think a push was need for existing ones .
Bosh dashboards are not working but it seems version miss-match issues will update soon.
@frodenas: can you tell me how those dashboards work when above mention metrics are not there, is there any doc/guide for dashboards config.
Thanks, Mandar K.
There isn't any specific doc/guide, but check the Operations files section at the README. Each op file deploys an exporter and the associated dashboards, so you can figure out based on that info which dashboards will work depending on your exporters.
Hi, Now I am able to see App: System, Latency and Request dashboards very well.
But I am not able to see Bosh:Jobs and Bosh:Overview Does it require the any changes in the bosh manifest as well because I can see some parameters on the dashboard such as Environment, Director, Deployment, Job etc. If yes where can I find them?
Depends on the exporter being used. If you're using the bosh_exporter
, the supported properties are here: https://github.com/cloudfoundry-community/prometheus-boshrelease/blob/master/jobs/bosh_exporter/spec. If you're using the graphite_exporter
, then you need to adapt the graphite_exporter.graphite.mapping_config
properties to match the labels returned by the bosh exporter.
Hi @frodenas We are still stuck on the bosh:jobs and bosh:overviews with Prometheus version 17.0.0. We are using the graphite_exporter and the properties mentioned can be seen in the attachment. bosh.txt My question is weather metrics namespace used as in https://github.com/cloudfoundry-community/bosh_exporter need to be include in our manifest as well. Are we missing some part of it? What else needs to be added so that we can having working bosh:jobs and bosh:overviews.
Thanks, Subhash
Yes, you're missing some labels. Just take an example:
*.*.*.*.system_cpu_sys
name="bosh_job_cpu_sys"
bosh_deployment="$1"
bosh_job_name="$2"
bosh_job_index="$3"
bosh_job_id="$3"
agent_id="$4"
The above mapping config
creates a bosh_job_cpu_sys
metric with labels bosh_deployment
, bosh_job_name
, bosh_job_index
, bosh_job_id
, agent_id
.
Now, if you check the labels exposed by the bosh_exporter
, you will see is adding labels environment
, bosh_name
, bosh_uuid
, bosh_deployment
, bosh_job_name
, bosh_job_id
, bosh_job_index
, bosh_job_az
, bosh_job_ip
.
So you are missing environment
, bosh_name
, bosh_uuid
, bosh_job_az
and bosh_job_ip
. Some of those labels can be hardcoded with random values at the mapping config
, specifically, environment
, bosh_name
and bosh_uuid
. bosh_job_az
is not used, so forget about it, so the only missing label will be bosh_job_ip
. Unfortunately, that label cannot be captured from the bosh hm (or the graphite_exporter
), so there will be a bunch of dashboards that will NOT work.
Said that, my recommendation is to switch to the bosh_exporter
as soon as possible.
@cfmagnum can we close this issue or do you still have problems with the upgrade?
Hello @frodenas , unfortunately we have some more issue while upgrading. Ill write you shortly with all the consolidated repos, which will give the clear view and you can guide us on correct path.
@frodenas we are unable to push/implement the bosh exporter as application. Is there any way to include bosh exporter in Bosh/Prometheus.
While pushing bosh_exporter app we are facing certs error. Also is there any binaries available to include bosh exporter like graphite_exporter in cf.
Thanks, Subhash
Well, you already have a bosh_exporter job as part of this release, and that's the recommended way to proceed.
You can just switch your graphite_exporter
job and use the bosh_exporter
one. The monitor-bosh.yml & enable-bosh-uaa.yml op files provides you an example of what properties will be required.
@frodenas we are facing some more issue related to v18.0 on our production environment which was working earlier on dev environment. Lets close this discussion for now. Ill open new one and share the logs there.
Thanks you once again @frodenas
Hi, Since last one week we are facing issue with Grafana Dashboard on Production environment. While looking for data on Apps:Dashboard on grafana we get as "No datapoint" PFA below... Also No orgs and space are generated which holds apps information. For now we thought that this issue can be with cf_exporter but not sure since there is no sufficient logs which can tell us the exact issue. Please find the below .text file which holds error logs. new 6.txt
Please help us on this issue since App related information are not at all available rest of the other dash board is working as expected.
Looking forward for the reply.
Thanks Subhash