Unable to fetch apps logs on grafana dashboard (Apps: Dashboard)

cfmagnum commented 7 years ago

Hi, Since last one week we are facing issue with Grafana Dashboard on Production environment. While looking for data on Apps:Dashboard on grafana we get as "No datapoint" PFA below... grafana2 Also No orgs and space are generated which holds apps information. For now we thought that this issue can be with cf_exporter but not sure since there is no sufficient logs which can tell us the exact issue. Please find the below .text file which holds error logs. new 6.txt

Please help us on this issue since App related information are not at all available rest of the other dash board is working as expected.

Looking forward for the reply.

Thanks Subhash

frodenas commented 7 years ago

@cfmagnum At the prometheus dashboard, is there anycf_* metric? If there aren't metrics, check the cf_exporter logs. I see at the attached file that it was unable to contact the cf api, and that you have restarted it, are there any other error messages after the restart?

cfmagnum commented 7 years ago

Hi @frodenas Thanks you for prompt response... To Ans your question: yes we do have cf_exporter * matric on Prometheus dashboard. And it is generating the logs as expected. Also there is currently api fluctuation issue after we upgraded the cf from v269 to v271. We have investigated the reason and found that when cf_exporter process is stooped, api is stable and running.

Can there be issue related to cf api or other things? Dashboard is still not giving any logs as we restarted the process.

Attaching the latest logs which are generated. new.txt

*there is no other specific logs apart from the attached one in cf_exporter.

frodenas commented 7 years ago

w/r/t the "api fluctuation" I recommend you upgrading the cf_exporter version (via a newer prometheus bosh release). The version that you are using puts lots of stress on the cf api, as it tries to get all app events. The latest version doesn't query the app events anymore, so should be safer to use.

cfmagnum commented 7 years ago

Will try upgrading the version to the latest as we hold older version of cf_exporter. Thanks you once again @frodenas for your views.

frodenas commented 7 years ago

@cfmagnum I forgot to tell that before upgrading the cf_exporter you can try to disable the ApplicationEvents by setting the cf_exporter.filter.collectors property to something like: "Applications,Organizations,SecurityGroups,Services,Spaces". Let's see if disabling this collector, the cf api behaves correctly.

cfmagnum commented 7 years ago

Hi @frodenas After disabling the collector there is no impact on api fluctuation. It is acting the same with lots of load. Finally we decided to upgrade the prometheus to 18.0.x version. Now since we are using 13.0.0 on our environment we are taking step by step forward. While coming to version 16.0.0 we are facing the authentication issue. We did all the require changes as suggested in the release. We deployed version 16.0.0 and it was successful, but when we try to access the grafana dashboard it asking for Authentication. After inserting correct user ID and Password it keeps on asking the same. PFA below: It also says about Site is not public. Can you please let us know what are we missing. Is there any certs part which is missed. If yes do we have to change that in CF or Current deployment of Prometheus.

Thanks, Subhash

frodenas commented 7 years ago

Can you please paste your sanitized deployment manifest?

cfmagnum commented 7 years ago

Hi @frodenas, PFA Prometheus v16.0.txt

frodenas commented 7 years ago

SSH into your grafana vm and locate the /var/vcap/jobs/grafana/bin/prometheus-datasource file. Inside, you'll find that it's setting thePROMETHEUS_URLenv var. It should point to your prometheus.web.external_url property value. Check the port, at your deployment manifest I see 9093 and it should be 9090. Check the IP, it should point to your ngnix vm ip address.

cfmagnum commented 7 years ago

grafana Hi @frodenas auth issue got resolved but we are unable to see any data on Grafana Dashboard. We have now upgraded Prometheus to v 17.0.0 from 16.0.0, and as per your previous threads i.e.

The grafana.promethus.dashboard_files property should point to /var/vcap/jobs/... not /var/vcap/packages The grafana.server.root_url should point to the nginx vip and port 3000 The prometheus.server.root_url should point to the nginx vip and port 9090

are taken care. Now we have same issue which we had in version 13.0.0, unable to see data on grafana dashboard. We checked prometheus dashboard a picked one metrics "firehose_container_metric_cpu_percentage" and we can view its logs on prometheus dashboard, but same thing when i try to go for Apps request data is not visible. PFA of both the case

Also prometheus target is checked and it is showing as below: prometheus 1 *Second IP for CF is not in use. Please let me know what i am missing here. Anything major?

Thanks Subhash

frodenas commented 7 years ago

At the prometheus dashboard, do you see any cf_application_info metric? If yes, can you please paste an screenshot?

frodenas commented 7 years ago

Also check that your cf_exporter.metrics.environment property matches firehose_exporter.metrics.environment and cf_exporter.cf.deployment_name matches your metron_agent.deployment (from your cf deployment, at the screenshot is aws-clients-acf-devtest-cf).

cfmagnum commented 7 years ago

Prometheus dashboard cf_application_info snapshot: prometheus 2

frodenas commented 7 years ago

Your firehose & cf metrics screenshots don't show the environment label, that means you're using an old prometheus release version. Have you upgraded the cf_exporter & firehose_exporter to use the same release as the prometheus deployment? Otherwise, is not going to work.

cfmagnum commented 7 years ago

So do you mean to say i want to upgrade Prometheus which i am already doing to 17.x.x for now. Other than this cf_exporter and firehose_exporter needs upgrade via upgrading cf?

frodenas commented 7 years ago

The deployment manifest that you attached does NOT deploy the cf_exporter and firehose_exporter. So I guess you are deploying it on another deployment (maybe cf). Release versions must match on both deployments.

mandar14 commented 7 years ago

Hi Ferdy, I work with cfmagnum, Yes cf-exporter and firehose_exporter sits within cf deployment. Both CF and Prometheus deployment uses same version of prometheus release (17.0.0).

After matching versions in both deployments, App: System dashboard working very well. but Latency and Request dashboards are still producing same error, No data points .

Just for curiosity went through Panel Json for Latency and Request Dashboards, found that metrics use in those json are not available on Prometheus. Can this be a the root cause of error?

Metric On grafana json: firehose_http_start_stop_client_request_duration_seconds_sum firehose_http_start_stop_client_request_duration_seconds

On prometheus, firehose_http_start_stop_cached only this.

frodenas commented 7 years ago

Those metrics are only generated by the firehose when there's traffic. Can you please generate some traffic on an application and then checking if that metric appears at prometheus?

Also, check that your firehose_exporter.filter.events property at your prometheus deployment manifest is NOT set, or if it is set, then it includes the HttpStartStop collector.

mandar14 commented 7 years ago

Quick update, All app dashboards are working, just curl the app which was giving no data point error ^_^ . Newly deployed app data populated correctly I think a push was need for existing ones .

Bosh dashboards are not working but it seems version miss-match issues will update soon.

@frodenas: can you tell me how those dashboards work when above mention metrics are not there, is there any doc/guide for dashboards config.

Thanks, Mandar K.

frodenas commented 7 years ago

There isn't any specific doc/guide, but check the Operations files section at the README. Each op file deploys an exporter and the associated dashboards, so you can figure out based on that info which dashboards will work depending on your exporters.

ghost commented 7 years ago

Hi, Now I am able to see App: System, Latency and Request dashboards very well.

But I am not able to see Bosh:Jobs and Bosh:Overview Does it require the any changes in the bosh manifest as well because I can see some parameters on the dashboard such as Environment, Director, Deployment, Job etc. If yes where can I find them?

frodenas commented 7 years ago

Depends on the exporter being used. If you're using the bosh_exporter, the supported properties are here: https://github.com/cloudfoundry-community/prometheus-boshrelease/blob/master/jobs/bosh_exporter/spec. If you're using the graphite_exporter, then you need to adapt the graphite_exporter.graphite.mapping_config properties to match the labels returned by the bosh exporter.

cfmagnum commented 7 years ago

Hi @frodenas We are still stuck on the bosh:jobs and bosh:overviews with Prometheus version 17.0.0. We are using the graphite_exporter and the properties mentioned can be seen in the attachment. bosh.txt My question is weather metrics namespace used as in https://github.com/cloudfoundry-community/bosh_exporter need to be include in our manifest as well. Are we missing some part of it? What else needs to be added so that we can having working bosh:jobs and bosh:overviews.

Thanks, Subhash

frodenas commented 7 years ago

Yes, you're missing some labels. Just take an example:

*.*.*.*.system_cpu_sys
name="bosh_job_cpu_sys"
bosh_deployment="$1"
bosh_job_name="$2"
bosh_job_index="$3"
bosh_job_id="$3"
agent_id="$4"

The above mapping config creates a bosh_job_cpu_sys metric with labels bosh_deployment, bosh_job_name, bosh_job_index, bosh_job_id, agent_id.

Now, if you check the labels exposed by the bosh_exporter, you will see is adding labels environment, bosh_name, bosh_uuid, bosh_deployment, bosh_job_name, bosh_job_id, bosh_job_index, bosh_job_az, bosh_job_ip.

So you are missing environment, bosh_name, bosh_uuid, bosh_job_az and bosh_job_ip. Some of those labels can be hardcoded with random values at the mapping config, specifically, environment, bosh_name and bosh_uuid. bosh_job_az is not used, so forget about it, so the only missing label will be bosh_job_ip. Unfortunately, that label cannot be captured from the bosh hm (or the graphite_exporter), so there will be a bunch of dashboards that will NOT work.

Said that, my recommendation is to switch to the bosh_exporter as soon as possible.

frodenas commented 7 years ago

@cfmagnum can we close this issue or do you still have problems with the upgrade?

cfmagnum commented 6 years ago

Hello @frodenas , unfortunately we have some more issue while upgrading. Ill write you shortly with all the consolidated repos, which will give the clear view and you can guide us on correct path.

cfmagnum commented 6 years ago

@frodenas we are unable to push/implement the bosh exporter as application. Is there any way to include bosh exporter in Bosh/Prometheus.

While pushing bosh_exporter app we are facing certs error. Also is there any binaries available to include bosh exporter like graphite_exporter in cf.

Thanks, Subhash

frodenas commented 6 years ago

Well, you already have a bosh_exporter job as part of this release, and that's the recommended way to proceed.

You can just switch your graphite_exporter job and use the bosh_exporter one. The monitor-bosh.yml & enable-bosh-uaa.yml op files provides you an example of what properties will be required.

cfmagnum commented 6 years ago

@frodenas we are facing some more issue related to v18.0 on our production environment which was working earlier on dev environment. Lets close this discussion for now. Ill open new one and share the logs there.

Thanks you once again @frodenas

cloudfoundry / prometheus-boshrelease

Unable to fetch apps logs on grafana dashboard (Apps: Dashboard) #132