Open kingbyteking opened 5 years ago
Please attach your config file, I'll have a look.
Attached please find the conf file. The reason I'm asking this question is due the fact I observed: When using Prometheus query, no data returned for those expired metrics. But from port 9144/metrics, the metrics are all there. I assume grok_exporter should not display those expired metrics in port 9144?
Is there any way to get debug info. It's seems some expired metrics are removed, but some are kept. Not sure if it's my input file issue or not.
global: config_version: 2 retention_check_interval: 53s input: type: file path: /path/to/input readall: false # Read from the beginning of the file? False means we start at the end of the file and read only new lines. grok: patterns_dir: /path/to/pattern metrics:
type: gauge name: worker_task_duration help: task running duration of given task. match: mypattern value: '{{.run_time}}' cumulative: false labels: WorkingQueue: '{{.queue}}' Status: '{{.status}}' StartTime: '{{.start_time}}' TaskID: '{{.task_id}}' retention: 5m
type: gauge name: scramble_task_duration help: scrambling task running time. match: mypattern value: '{{.run_time}}' cumulative: false labels: Status: '{{.status}}' Product: '{{.product}}' StartTime: '{{.start_time}}' TaskID: '{{.task_id}}' AttachmentID: '{{.attachment_id}}' TsJobID: '{{.ts_id}}' ErrorMessage: '{{.error_msg}}' retention: 5m
type: gauge name: encryption_task_duration help: encryption task running time. match: mypattern value: '{{.run_time}}' cumulative: false labels: Status: '{{.status}}' StartTime: '{{.start_time}}' TaskID: '{{.task_id}}' AttachmentID: '{{.attachment_id}}' retention: 5m
server: port: 9144
Your config looks ok. The expected behavior is: If a metric is not updated for 5:53 minutes (retention time plus check interval), the metric should disappear from http://localhost:9144/metrics. Are you sure that no new log lines for these metrics are written?
Quick experiment to verify that retention works in general: The grok_exporter
distribution has an example in ./example/config.yaml
. The example metric is a counter named exim_rejected_rcpt_total
. I copied and pasted the metric to create exim_rejected_rcpt_total2
, which is exactly the same as exim_rejected_rcpt_total
but with retention: 2m
:
global:
config_version: 2
input:
type: file
path: ./example/exim-rejected-RCPT-examples.log
readall: true # Read from the beginning of the file? False means we start at the end of the file and read only new lines.
grok:
patterns_dir: ./logstash-patterns-core/patterns
additional_patterns:
- 'EXIM_MESSAGE [a-zA-Z ]*'
metrics:
- type: counter
name: exim_rejected_rcpt_total
help: Total number of rejected recipients, partitioned by error message.
match: '%{EXIM_DATE} %{EXIM_REMOTE_HOST} F=<%{EMAILADDRESS}> rejected RCPT <%{EMAILADDRESS}>: %{EXIM_MESSAGE:message}'
labels:
error_message: '{{.message}}'
- type: counter
name: exim_rejected_rcpt_total2
help: Total number of rejected recipients, partitioned by error message.
match: '%{EXIM_DATE} %{EXIM_REMOTE_HOST} F=<%{EMAILADDRESS}> rejected RCPT <%{EMAILADDRESS}>: %{EXIM_MESSAGE:message}'
labels:
error_message: '{{.message}}'
retention: 2m
server:
host: localhost
port: 9144
Now I run grok_exporter -config ./example/config.yml
. Initially, I see the same matches for both metrics:
exim_rejected_rcpt_total{error_message="Sender verify failed"} 2000
exim_rejected_rcpt_total{error_message="Unrouteable address"} 32
exim_rejected_rcpt_total{error_message="relay not permitted"} 165
exim_rejected_rcpt_total2{error_message="Sender verify failed"} 2000
exim_rejected_rcpt_total2{error_message="Unrouteable address"} 32
exim_rejected_rcpt_total2{error_message="relay not permitted"} 165
Obviously no new log messages are written, the logfile is unchanged. After 3 minutes, I see the exim_rejected_rcpt_total2
disappear and only the exim_rejected_rcpt_total
metrics are left:
exim_rejected_rcpt_total{error_message="Sender verify failed"} 2000
exim_rejected_rcpt_total{error_message="Unrouteable address"} 32
exim_rejected_rcpt_total{error_message="relay not permitted"} 165
You should have similar behavior with your config.
I have the following observation these days, the finding is: only one metrics retention didn't work, which means it's there all the time. the other metrics could be deleted by retention timer. For the un-deleted metrics, I have some empty lable value "" in the metrics, not sure if it's the cause. I will try to verify this way and feedback later.
I replace those lable value from "" to "-", then the retention works normal now. Will the team fix this issue?
So far, I'm using prometheus client lib to push metrics to resolve this issue.
Thanks for the analysis, I'll look into this.
In my grok configuration file, I enabled retention setting for gauge metrics. retention: 5m I use curl command to get the metrics. Even after 10 minutes of metrics being generated, I can still see my self defined metrics in http://localhost:9144/metrics
Would some one help me out how retention setting works, why it's still exist in http://localhost:9144/metrics? command I use: curl http://localhost:9144/metrics|grep my_metrics | wc -l ... 546
should I use some parameter setting to get the correct result. I expect expired metrics should not be present in /metrics.