fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
257 stars 79 forks source link

Exporter emits illegal values sometimes #122

Closed talonx closed 5 years ago

talonx commented 5 years ago

Prometheus fails to scrape the output from the exporter sometimes with this error

strconv.ParseFloat: parsing "-": invalid syntax

OS (where td-agent is installed)

NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

td-agent 1.7.0

Configuration

<source>
  @type tail
  tag nginxlog
  path /var/log/nginx/kb_proxy_access.log
  path_key tailed_path
  pos_file /tmp/fluent_nginx.pos
  <parse>
    @type grok
    grok_pattern %{IPORHOST:ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+)
  </parse>
</source>

<filter nginxlog>
  @type prometheus
  <metric>
    name nginx_request_count
    type counter
    desc Total number of requests received
    <labels>
      source ${tailed_path}
      method ${verb}
    </labels>
  </metric>
  <metric>
    name nginx_response_count
    type counter
    desc Total number of response codes and number of responses sent per response code
    <labels>
      source ${tailed_path}
      method ${verb}
      status ${status}
    </labels>
  </metric>
  <metric>
    name nginx_response_duration_seconds
    type summary
    desc Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
    key request_time
    <labels>
      source ${tailed_path}
      method ${verb}
      status ${status}
    </labels>
  </metric>
  <metric>
    name nginx_response_upstream_duration_seconds
    type summary
    desc Time between establishing a connection to an upstream server and receiving the last byte of the response body
    key upstream_response_time
    <labels>
      source ${tailed_path}
      method ${verb}
      status ${status}
    </labels>
  </metric>
</filter>

<source>
  @type prometheus
</source>

<match *>
  @type copy
  <store>
    @type prometheus
  </store>
</match>

Plugins were installed using

/opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-prometheus

and

/opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-grok-parser

Sample log file (with revealing information removed)

x.x.x.x - - [10/Oct/2019:06:13:28 +0000] "POST /es/_msearch?rest_total_hits_as_int=true&ignore_throttled=true HTTP/1.1" 200 1365 "https://xxxxx..." "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/77.0.3865.90 Chrome/77.0.3865.90 Safari/537.36" "x.x.x.x" 1868 1641 0.225 0.200

The offending 3 output lines from the plugin seem to be these (via a curl localhost:24231/metrics). The value of the metric is a "-" instead of a number.

nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.99"} -
talonx commented 5 years ago

Do you need any more information to debug this?

ganmacs commented 5 years ago

could you sent us complete error stack trace(not only error line but also stack trace and other lines) and complete return value from prometheus plugin via a curl command?

talonx commented 5 years ago

Complete plugin output via curl

curl localhost:24231/metrics
# TYPE nginx_request_count counter
# HELP nginx_request_count Total number of requests received
nginx_request_count{source="/var/log/nginx/kb_proxy_access.log",method="POST"} 20729.0
nginx_request_count{source="/var/log/nginx/kb_proxy_access.log",method="GET"} 16202.0
nginx_request_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT"} 214.0
nginx_request_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD"} 8.0
# TYPE nginx_response_count counter
# HELP nginx_response_count Total number of response codes and number of responses sent per response code
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200"} 20086.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304"} 11458.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200"} 4569.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200"} 135.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404"} 11.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499"} 68.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403"} 154.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403"} 1.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400"} 41.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400"} 4.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499"} 10.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302"} 18.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409"} 75.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500"} 379.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403"} 105.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302"} 6.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503"} 24.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503"} 1.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403"} 2.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500"} 3.0
nginx_response_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500"} 3.0
# TYPE nginx_response_duration_seconds summary
# HELP nginx_response_duration_seconds Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.5"} 0.199
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.9"} 1.134
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.99"} 2.598
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200"} 20086
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.5"} 0.003
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.9"} 0.030
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.99"} 0.037
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304"} 11458
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.5"} 0.027
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.9"} 0.086
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.99"} 0.177
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200"} 4569
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.5"} 0.691
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.9"} 1.131
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.99"} 1.223
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200"} 135
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.5"} 0.006
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.9"} 0.002
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.99"} 0.012
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404"} 11
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.5"} 20.037
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.9"} 41.121
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.99"} 5.417
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499"} 68
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.5"} 0.001
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.9"} 0.001
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.99"} 0.002
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403"} 154
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.5"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.9"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.99"} 0.000
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403"} 1
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.5"} 0.028
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.9"} 0.071
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.99"} 0.082
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400"} 41
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.5"} 0.001
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.9"} 0.003
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.99"} 0.003
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400"} 4
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.5"} 0.001
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.9"} 39.490
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.99"} 39.490
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499"} 10
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.5"} 0.010
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.9"} 0.122
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.99"} 0.122
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302"} 18
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.5"} 0.146
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.9"} 0.277
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.99"} 0.377
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409"} 75
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.5"} 60.029
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.9"} 60.071
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.99"} 60.078
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500"} 379
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.5"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.9"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.99"} 0.000
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403"} 105
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.5"} 0.002
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.9"} 0.003
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.99"} 0.003
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302"} 6
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.5"} 0.014
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.9"} 0.020
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.99"} 0.032
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503"} 24
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.5"} 0.029
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.9"} 0.029
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.99"} 0.029
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503"} 1
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.5"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.9"} 0.000
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.99"} 0.000
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403"} 2
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.5"} 0.081
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.9"} 0.081
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.99"} 0.081
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500"} 3
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.5"} 0.007
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.9"} 0.007
nginx_response_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.99"} 0.007
nginx_response_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500"} 0
nginx_response_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500"} 3
# TYPE nginx_response_upstream_duration_seconds summary
# HELP nginx_response_upstream_duration_seconds Time between establishing a connection to an upstream server and receiving the last byte of the response body
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.9"} 19
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200",quantile="0.99"} 5
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="200"} 20086
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="304"} 11458
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="200"} 4569
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.9"} 1
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200",quantile="0.99"} 6
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="200"} 135
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="404"} 11
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="499"} 68
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="403"} 154
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="403"} 1
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="400"} 41
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="400"} 4
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499"} 10
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="302"} 18
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.9"} 2
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409",quantile="0.99"} 7
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="409"} 75
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.5"} 60
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.9"} 60
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500",quantile="0.99"} 60
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="500"} 379
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="403"} 105
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="302"} 6
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="503"} 24
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="POST",status="503"} 1
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.5"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.9"} -
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403",quantile="0.99"} -
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="HEAD",status="403"} 2
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="PUT",status="500"} 3
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.5"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.9"} 0
nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500",quantile="0.99"} 0
nginx_response_upstream_duration_seconds_sum{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500"} 0
nginx_response_upstream_duration_seconds_count{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="500"} 3

Prometheus does not produce any stack trace with this error. Here is what is there in the syslog (where Prometheus outputs its log)

Oct 17 08:26:34 ip-10-0-0-229 prometheus[25918]: level=warn ts=2019-10-17T08:26:34.294Z caller=scrape.go:930 component="scrape manager" scrape_pool=fluentd-prom target=http://10.0.0.150:24231/metrics msg="append failed" err="strconv.ParseFloat: parsing \"-\": invalid syntax"
ganmacs commented 5 years ago

I'm not sure but the cause might be that your log is not formatted with the grock_pattern you specified in the config file. Your log is not related to the part of printing - since it's POST with 200 (HTTP response code)log. The logs we need are HEAD with 403, GET with 503, GET with 499, POST with 499. Could you check such logs?

talonx commented 5 years ago

Here is a GET with 499 from the log file I pasted above

nginx_response_upstream_duration_seconds{source="/var/log/nginx/kb_proxy_access.log",method="GET",status="499",quantile="0.5"} -

I suspect that the root of the issue is that the "-" is being put in the end instead of a number (should be 0 in such cases?) and Prometheus is failing to parse it.

ganmacs commented 5 years ago

No. groks log. it's a fluentd-promeths output. I was talking about groks log. I need the log of HEAD with 403, GET with 503, GET with 499, POST with 499 like fllowing log(this is groks log, right?).

x.x.x.x - - [10/Oct/2019:06:13:28 +0000] "POST /es/_msearch?rest_total_hits_as_int=true&ignore_throttled=true HTTP/1.1" 200 1365 "https://xxxxx..." "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/77.0.3865.90 Chrome/77.0.3865.90 Safari/537.36" "x.x.x.x" 1868 1641 0.225 0.200

I suspect that the root of the issue is that the "-" is being put in the end instead of a number (should be 0 in such cases?) and Prometheus is failing to parse it.

Yes. I knew. My expectation is the reason why printing "-" is groks_parttern isn't matched with actual groks log. so I said like that.

The logs we need are HEAD with 403, GET with 503, GET with 499, POST with 499. Could you check such logs?

talonx commented 5 years ago

Sorry, I misunderstood the first time. Here is a sample log (it's the nginx log which is being processed by grok, I presume that's what you meant).

POST with 499

x.x.x.x - - [17/Oct/2019:12:36:44 +0000] "POST /elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true HTTP/1.1" 499 0 "ref" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "x.x.x.x" 1824 0 0.063 -

The last number (which is supposed to be upstream response time) is a "-". GET with 499

x.x.x.x - - [17/Oct/2019:09:35:10 +0000] "GET /bundles/ebdca7741674eca4e1fadeca157f3ae6.svg HTTP/1.1" 499 0 "ref" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "x.x.x.x" 515 0 0.026 -

This also has a - in the last line. Most lines have a valid number instead of the -, I guess that's causing the grok parser to fail.

Here is the nginx log format for reference

   log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                       '$status $body_bytes_sent "$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for"'
                       ' $request_length $bytes_sent $request_time $upstream_response_time';

This might be an nginx specific issue rather than the plugin. I'll investigate this and get back.

talonx commented 5 years ago

I found that nginx does not log the upstream_response_time when the error code is 499 - https://forum.nginx.org/read.php?2,232482,232621#msg-232621

However, I have been seeing parse errors for non-499 codes also, when all the fields are present. E.g. from the td-agent.log

2019-10-18 05:39:47 +0000 [warn]: #0 prometheus: failed to instrument a metric. error_class=TypeError error=#<TypeError: String can't be coerced into Integer> tag="nginxlog" name="nginx_response_upstream_duration_seconds"
2019-10-18 05:39:47 +0000 [warn]: #0 dump an error event: error_class=TypeError error="String can't be coerced into Integer" location="/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/quantile-0.2.1/lib/quantile/estimator.rb:70:in `+'" tag="nginxlog" time=2019-10-18 05:39:47.608333407 +0000 record={"ip"=>"x.x.x.x", "user"=>"-", "timestamp"=>"18/Oct/2019:05:39:47 +0000", "verb"=>"POST", "request"=>"/elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true", "httpversion"=>"1.1", "status"=>"200", "body_bytes_sent"=>"38963", "referrer"=>"\"ref\"", "user_agent"=>"\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36\"", "fwd_for"=>"\"x.x.x.x\"", "request_length"=>"1833", "bytes_sent"=>"39239", "request_time"=>"0.291", "upstream_response_time"=>"0", "tailed_path"=>"/var/log/nginx/kb_proxy_access.log"}

The corresponding nginx log line with status code 200 and all log fields present (matched with the help of the timestamp) is

x.x.x.x - - [18/Oct/2019:05:39:47 +0000] "POST /elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true HTTP/1.1" 200 38963 "ref" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36" "x.x.x.x" 1833 39239 0.291 0.224
ganmacs commented 5 years ago

Please take a look your error log before ask here... The value of summary should be integer by your error log(String can't be coerced into Integer). So convert the value into integer https://github.com/fluent/fluent-plugin-grok-parser/tree/2c9901c1a3dc473b28ee3c04cd4780f6b5aeede4#faqs