cloudfoundry / log-cache

Archived: Now bundled in https://github.com/cloudfoundry/log-cache-release
Apache License 2.0
14 stars 11 forks source link

Metrics with missing "origin" tag #107

Open jochenehret opened 5 years ago

jochenehret commented 5 years ago

We are using CF v11.0.0. We've noticed that there are some log-cache related metrics which don't have an "origin" tag set. This should usually contain the BOSH job name. You can find those metrics with the Loggregator Firehose Plugin:

$ cf nozzle -f CounterEvent | grep origin:\"\"
origin:"" eventType:CounterEvent timestamp:1566482705087818493 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" tags:<key:"nodeIndex" value:"0" > counterEvent:<name:"ingress_dropped" delta:0 total:0 >
origin:"" eventType:CounterEvent timestamp:1566482705087826305 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"log_cache_ingress" delta:0 total:59586988 >
origin:"" eventType:CounterEvent timestamp:1566482705087836404 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"log_cache_egress" delta:0 total:22255 >
origin:"" eventType:CounterEvent timestamp:1566482705087837399 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"log_cache_expired" delta:0 total:57971690 >
origin:"" eventType:CounterEvent timestamp:1566482705087840570 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"log_cache_promql_timeout" delta:0 total:0 >
origin:"" eventType:CounterEvent timestamp:1566482705089619633 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"nozzle_ingress" delta:0 total:40629203 >
origin:"" eventType:CounterEvent timestamp:1566482705089621180 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"nozzle_dropped" delta:0 total:0 >
origin:"" eventType:CounterEvent timestamp:1566482705089622362 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"nozzle_egress" delta:0 total:40629185 >
origin:"" eventType:CounterEvent timestamp:1566482705089623495 deployment:"cf" job:"doppler" index:"b7db7ee0-2d05-4992-a146-76a299274e09" ip:"10.2.1.71" counterEvent:<name:"nozzle_err" delta:0 total:0 >
$ cf nozzle -f ValueMetric | grep origin:\"\"
origin:"" eventType:ValueMetric timestamp:1566482782658218072 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"bytes" > valueMetric:<name:"log_cache_heap_in_use_memory" value:2.29629952e+09 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658223118 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"bytes" > valueMetric:<name:"log_cache_available_system_memory" value:5.85416704e+09 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658225384 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"milliseconds" > valueMetric:<name:"log_cache_promql_range_query_time" value:0 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658226948 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"seconds" > valueMetric:<name:"log_cache_uptime" value:269527 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658228877 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"milliseconds" > valueMetric:<name:"log_cache_cache_period" value:2.6950197e+08 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658230715 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"percentage" > valueMetric:<name:"log_cache_memory_utilization" value:27.45498077583264 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658234006 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"bytes" > valueMetric:<name:"log_cache_total_system_memory" value:8.363872256e+09 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658238223 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"milliseconds" > valueMetric:<name:"log_cache_promql_instant_query_time" value:0 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658239532 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"entries" > valueMetric:<name:"log_cache_store_size" value:2.062325e+06 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658241782 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"milliseconds" > valueMetric:<name:"log_cache_truncation_duration" value:0 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658693404 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"nanoseconds" > valueMetric:<name:"cf_auth_proxy_last_capiv3_apps_latency" value:0 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658696651 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"nanoseconds" > valueMetric:<name:"cf_auth_proxy_last_capiv3_list_service_instances_latency" value:0 unit:"" >
origin:"" eventType:ValueMetric timestamp:1566482782658698969 deployment:"cf" job:"doppler" index:"d114c6f0-0058-46d4-b2ed-1bbc51c305c0" ip:"10.1.1.71" tags:<key:"unit" value:"nanoseconds" > valueMetric:<name:"cf_auth_proxy_last_capiv3_apps_by_name_latency" value:0 unit:"" >

We use "origin", "job" and "deployment" to build distinct metric names in our Firehose nozzle. Can you please check why "origin" is missing?

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/168055642

The labels on this github issue will be updated when the story is started.

kitemongerer commented 5 years ago

@jochenehret This is a known issue from switching to prom scraper for metric egress. This should be fixed in the upcoming 2.5.0 release of https://github.com/cloudfoundry/log-cache-release

kitemongerer commented 5 years ago

Actually, sorry I mixed up source_id and origin. Log Cache metrics have historically not set an origin since that is the deprecated way of identifying components. I would recommend using the source_id tag rather than the origin

jochenehret commented 5 years ago

Ok, but the metrics above don't have a "source_id" tag. Can that be fixed?

kitemongerer commented 5 years ago

Yes that is what will be fixed in the upcoming 2.5.0 release of https://github.com/cloudfoundry/log-cache-release