QubitProducts / exporter_exporter

A reverse proxy designed for Prometheus exporters
Apache License 2.0
336 stars 55 forks source link

Duplicate help item with missing metric causes loss of all metrics #39

Closed danpoltawski closed 4 years ago

danpoltawski commented 4 years ago

I'm using the ceph exporter and having just started investigating rbd mirroring the following error just occurred:

An error has occurred while serving metrics:

text format parsing error in line 2614: second HELP line for metric name "ceph_rbd_mirror_replay"

When looking at the ceph endpoint, it looks like the problematic HELP item is for an item without a metric to go with it:

# HELP ceph_rocksdb_get_latency_count Get latency Count
# TYPE ceph_rocksdb_get_latency_count counter
ceph_rocksdb_get_latency_count{ceph_daemon="mon.link"} 4236013.0
ceph_rocksdb_get_latency_count{ceph_daemon="mon.yoshi"} 4188788.0
ceph_rocksdb_get_latency_count{ceph_daemon="mon.bowser"} 4158142.0
# HELP ceph_rbd_mirror_replay Replays
# TYPE ceph_rbd_mirror_replay counter
# HELP ceph_prioritycache:meta_pri0_bytes bytes allocated to pri0
# TYPE ceph_prioritycache:meta_pri0_bytes gauge
ceph_prioritycache:meta_pri0_bytes{ceph_daemon="osd.6"} 0.0
ceph_prioritycache:meta_pri0_bytes{ceph_daemon="osd.14"} 0.0

I understand this may be a bug in the ceph exporter, but is there a way to avoid this causing the loss of all metrics in exporter_exporter?

danpoltawski commented 4 years ago

Ah, sorry, when grepping the output I see the problem:

[poltawski@link ~]$ grep ceph_rbd_mirror_replay out 
# HELP ceph_rbd_mirror_replay Replays
# TYPE ceph_rbd_mirror_replay counter
ceph_rbd_mirror_replay{ceph_daemon="rbd-mirror.12705583"} 0.0
ceph_rbd_mirror_replay{ceph_daemon="rbd-mirror.12685403"} 0.0
ceph_rbd_mirror_replay{ceph_daemon="rbd-mirror.12695553"} 0.0
# HELP ceph_rbd_mirror_replay_bytes Replayed data
# TYPE ceph_rbd_mirror_replay_bytes counter
ceph_rbd_mirror_replay_bytes{ceph_daemon="rbd-mirror.12705583"} 0.0
ceph_rbd_mirror_replay_bytes{ceph_daemon="rbd-mirror.12685403"} 0.0
ceph_rbd_mirror_replay_bytes{ceph_daemon="rbd-mirror.12695553"} 0.0
# HELP ceph_rbd_mirror_replay_latency_sum Replay latency Total
# TYPE ceph_rbd_mirror_replay_latency_sum counter
# HELP ceph_rbd_mirror_replay_latency_count Replay latency Count
# TYPE ceph_rbd_mirror_replay_latency_count counter
ceph_rbd_mirror_replay_latency_count{ceph_daemon="rbd-mirror.12705583"} 0.0
ceph_rbd_mirror_replay_latency_count{ceph_daemon="rbd-mirror.12685403"} 0.0
ceph_rbd_mirror_replay_latency_count{ceph_daemon="rbd-mirror.12695553"} 0.0
# HELP ceph_rbd_mirror_replay Replays
# TYPE ceph_rbd_mirror_replay counter
# HELP ceph_rbd_mirror_replay_bytes Replayed data
# TYPE ceph_rbd_mirror_replay_bytes counter
# HELP ceph_rbd_mirror_replay_latency_count Replay latency Count
# TYPE ceph_rbd_mirror_replay_latency_count counter
# HELP ceph_rbd_mirror_replay_latency_sum Replay latency Total
# TYPE ceph_rbd_mirror_replay_latency_sum counter
ceph_rbd_mirror_replay_latency_sum{ceph_daemon="rbd-mirror.12705583"} 0.0
ceph_rbd_mirror_replay_latency_sum{ceph_daemon="rbd-mirror.12685403"} 0.0
ceph_rbd_mirror_replay_latency_sum{ceph_daemon="rbd-mirror.12695553"} 0.0
danpoltawski commented 4 years ago

I suppose my thinking around creating this issue was - can anything be done to avoid such problems causing errors in the exporter (and instead let prometheus itself deal with them?) I've no idea what prometheus would make of the above output..

tcolgate commented 4 years ago

Unfortunately this is returned by one of the upstream prom packages and isn't something I can fix in expexp. We use the same code that prometheus uses, so it seems quite likely that a prometheus trying to scrape this would have the same problem (if not then we may just be able to update the depedencies)

danpoltawski commented 4 years ago

Fair enough, makes sense to close this issue then - thanks for your response

tcolgate commented 4 years ago

Actually, I'd completely forgotten that you can disable the parsing (and a recent PR from a user makes the behaviour on this more consistant). e.g.

modules:
  yourcephmodule:
     verify: false
danpoltawski commented 4 years ago

Oh, great!

danpoltawski commented 4 years ago

Our of interest, what's the benefit of parsing? (I kinda assumed it was already a 'dumb' reverse proxy)

tcolgate commented 4 years ago

mostly to avoid sending invalid data back to prom, and to prevent accidental security woopsies.

danpoltawski commented 4 years ago

Found this bug, just in case anyone else google's upon this issue: https://tracker.ceph.com/issues/43004

danpoltawski commented 4 years ago

https://github.com/ceph/ceph/pull/32184