cloudfoundry / prometheus-boshrelease

bosh release for prometheus ecosystem
Apache License 2.0
113 stars 163 forks source link

CF exporter not working anymore #493

Open mchabane opened 4 months ago

mchabane commented 4 months ago

Hello, Since some days our cf exporter does not work anymore with this error :

time="2024-07-01T13:34:50Z" level=error msg="[ 7] users error: The UAA service is currently unavailable" time="2024-07-01T13:34:50Z" level=debug msg="[ 7] users (done, 186 sec)"

We checked our UAA and did not find anything.

benjaminguttmann-avtq commented 4 months ago

Hi @mchabane which version of prometheus boshrelease do you have in use?

mchabane commented 4 months ago

Hello, We were in 30.1.1, and we upgraded to 30.3.0 but it did not fix the issue.

benjaminguttmann-avtq commented 4 months ago

On the VM that is running the cf_exporter can you try to reach you configured API via something like nc -vz <api> 443

mchabane commented 4 months ago

nc command result : 443 port [tcp/https] succeeded!

We activated debug logs, and everything seems to work except "users". Here some log extracts :

time="2024-07-01T12:11:43Z" level=info msg="collecting objects from cloud foundry API" time="2024-07-01T12:11:43Z" level=debug msg="waiting for work groups to complete" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] info" time="2024-07-01T12:11:43Z" level=debug msg="[ 4] routes" time="2024-07-01T12:11:43Z" level=debug msg="[ 6] process" time="2024-07-01T12:11:43Z" level=debug msg="[ 2] spaces" time="2024-07-01T12:11:43Z" level=debug msg="[ 7] space_quotas" time="2024-07-01T12:11:43Z" level=debug msg="[ 3] org_quotas" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] security_groups" time="2024-07-01T12:11:43Z" level=debug msg="[ 0] organizations" time="2024-07-01T12:11:43Z" level=debug msg="[ 5] applications" time="2024-07-01T12:11:43Z" level=debug msg="[ 8] route_services" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] info (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] stacks" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] security_groups (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] buildpacks" time="2024-07-01T12:11:43Z" level=debug msg="[ 3] org_quotas (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 3] service_brokers" time="2024-07-01T12:11:43Z" level=debug msg="[ 7] space_quotas (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 7] service_offerings" time="2024-07-01T12:11:43Z" level=debug msg="[ 8] route_services (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 8] service_instances" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] stacks (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] service_plans" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] buildpacks (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] segments" time="2024-07-01T12:11:43Z" level=debug msg="[ 3] service_brokers (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 3] service_bindings" time="2024-07-01T12:11:43Z" level=debug msg="[ 0] organizations (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 0] users" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] segments (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] events" time="2024-07-01T12:11:43Z" level=debug msg="[ 7] service_offerings (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] service_plans (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 9] events (done, 0 sec)" time="2024-07-01T12:11:43Z" level=debug msg="[ 7] space_summaries 0000/0790 (4fa80dcc-be21-4258-be7a-d5e2e6a31b0b)" time="2024-07-01T12:11:43Z" level=debug msg="[ 1] space_summaries 0001/0790 (baf1d8d0-e849-442c-9bf3-fb03548b5136)"

time="2024-07-01T14:12:36Z" level=debug msg="[ 9] space_summaries 0784/0791 (a741cc5f-2d37-4303-8b50-dd48244d3426) (done, 1 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 6] space_summaries 0786/0791 (5683e8e8-efae-44df-b44e-59f818560b05) (done, 1 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 7] space_summaries 0788/0791 (65bc824f-0d89-4128-9770-aa1baa3d59f5) (done, 1 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 5] space_summaries 0787/0791 (58bbf327-7d7a-4305-b30d-24f576231dfd) (done, 1 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 2] space_summaries 0783/0791 (c3dc3c7e-6427-4768-a629-347aa08ae271) (done, 2 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 3] space_summaries 0781/0791 (9c61be80-3fb4-47e8-b235-3551bda20f66) (done, 2 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 8] space_summaries 0776/0791 (a9f05970-440e-48a2-96f3-2f7659cccfe3) (done, 3 sec)" time="2024-07-01T14:12:37Z" level=debug msg="[ 1] space_summaries 0779/0791 (e175c937-166d-4be4-be49-b27b3b4e1b48) (done, 2 sec)" time="2024-07-01T14:12:59Z" level=error msg="[ 4] users error: The UAA service is currently unavailable" time="2024-07-01T14:12:59Z" level=debug msg="[ 4] users (done, 196 sec)" time="2024-07-01T14:12:59Z" level=info msg="collecting objects from cloud foundry API (done, 196 sec)"

benjaminguttmann-avtq commented 4 months ago

what happens if you run

curl http://localhost:9193/metrics

does that print some error messages?

mchabane commented 4 months ago

No error, it's very long to answer, and we are missing a lot of metrics (cf_application, cf_info ...). We mostly have metrics about scraping.

mchabane commented 4 months ago

Hello, I removed "Events" filter and it's work again. But I need this filter. What could i do on my cloud foundry fondation to simulate events collector and see what is not working? Thank you

benjaminguttmann-avtq commented 4 months ago

I would expect, you need to cf curl /v3/events @gmllt maybe you can answer that question better ?

mchabane commented 4 months ago

Hello, cf curl /v3/audit_events/ , /v3/app_usage_events and /v3/service_usage_events works well with my admin user account.

mchabane commented 4 months ago

It also work well by curl the api with the cf_exporter client token.

gmllt commented 4 months ago

Hello @mchabane To simulate the call made when retrieving events, you can play the following query: /v3/audit_events?per_page=5000&order_by=-created_at&created_ats=<time> where

Just in case, are you using pxc-release and if so, are you using MySQL version 8.0? We have seen an impact on cloudfoundry API response times in our environments with this version of MySQL.

mchabane commented 4 months ago

Hello @gmllt, We are not using mysql 8.0.

The pagination seems to be good, we have the same type of answer as an other foundation where we do not have issue.

mchabane commented 3 months ago

Hello @gmllt , We still have the issue, for the moment we've disabled events, but we need it. Here the pagination result: { "pagination": { "total_results": 4, "total_pages": 1, "first": { "href": "https://api.XXXXXv3/audit_events?created_ats=2024-08-14T08%3A37%3A00Z&order_by=-created_at&page=1&per_page=5000" }, "last": { "href": "https://api.XXXXX/v3/audit_events?created_ats=2024-08-14T08%3A37%3A00Z&order_by=-created_at&page=1&per_page=5000" }, "next": null, "previous": null },