Closed talonx closed 5 years ago
Please write the configuration and sample log data to reproduce the problem.
Configuration (nginx logs)
<source>
@type tail
tag nginxlog
path /var/log/nginx/nginx_access.log
path_key tailed_path
pos_file /tmp/fluent_nginx.pos
<parse>
@type grok
grok_pattern %{IPORHOST:ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+)
</parse>
</source>
<filter nginxlog>
@type prometheus
<metric>
name nginx_request_count
type counter
desc Total number of requests received
<labels>
source ${tailed_path}
method ${verb}
</labels>
</metric>
<metric>
name nginx_response_count
type counter
desc Total number of response codes and number of responses sent per response code
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_duration_seconds
type summary
desc Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
key request_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_upstream_duration_seconds
type summary
desc Time between establishing a connection to an upstream server and receiving the last byte of the response body
key upstream_response_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
</filter>
Note that I am using fluent-plugin-prometheus and fluent-plugin-grok-parser
Sample log data (IP etc information masked)
<ip> - <user> [03/Oct/2019:15:50:21 +0000] "POST /_bulk HTTP/1.1" 200 253031 "-" "Go-http-client/1.1" <referrer>" 72687 253213 0.975 0.976
This works find on Ubuntu 16.04.
Could you paste all configuration? Your configuration does not include source
prometheus config and match
config.
Sorry, here you go. The previous one was from an Ansible template, and it was partial. This is the complete one from the live server.
<source>
@type tail
tag nginxlog
path /var/log/nginx/es_proxy_access.log
path_key tailed_path
pos_file /tmp/fluent_nginx.pos
<parse>
@type grok
grok_pattern %{IPORHOST:ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+)
</parse>
</source>
<filter nginxlog>
@type prometheus
<metric>
name nginx_request_count
type counter
desc Total number of requests received
<labels>
source ${tailed_path}
method ${verb}
</labels>
</metric>
<metric>
name nginx_response_count
type counter
desc Total number of response codes and number of responses sent per response code
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_duration_seconds
type summary
desc Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
key request_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_upstream_duration_seconds
type summary
desc Time between establishing a connection to an upstream server and receiving the last byte of the response body
key upstream_response_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
</filter>
<source>
@type prometheus
</source>
<match *>
@type copy
<store>
@type prometheus
</store>
</match>
it worked in my env. port is wrong. 24231 is default port
Was not the default 24231 earlier? I cannot see anything listening on 24231 here
My environment listens on 24231 as you can see below. so maybe you are wrong where is not related to fluentd.
root@e2b353d5951c:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
root@e2b353d5951c:/# td-agent --version
td-agent 1.7.0
root@e2b353d5951c:/# td-agent
2019-10-04 05:12:14 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2019-10-04 05:12:14 +0000 [info]: Expanded the pattern %{IPORHOST:ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+) into (?<ip>(?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|(?:(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))))|(?:\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)))) - (?<user>(?:[a-zA-Z0-9._-]+)) \[(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+))))\] "(?:(?<verb>\b\w+\b) (?<request>\S+)(?: HTTP/(?<httpversion>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))?|(?<rawrequest>.*?))" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) (?<referrer>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<user_agent>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<fwd_for>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) (?<request_time>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) (?<upstream_response_time>[0-9-]+)
2019-10-04 05:12:14 +0000 [info]: using configuration file: <ROOT>
<source>
@type tail
tag "nginxlog"
path "/var/log/nginx/es_proxy_access.log"
path_key "tailed_path"
pos_file "/tmp/fluent_nginx.pos"
<parse>
@type "grok"
grok_pattern "%{IPORHOST:ip} - %{USER:user} \\[%{HTTPDATE:timestamp}\\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+)"
</parse>
</source>
<filter nginxlog>
@type prometheus
<metric>
name nginx_request_count
type counter
desc Total number of requests received
<labels>
source ${tailed_path}
method ${verb}
</labels>
</metric>
<metric>
name nginx_response_count
type counter
desc Total number of response codes and number of responses sent per response code
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_duration_seconds
type summary
desc Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
key request_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
<metric>
name nginx_response_upstream_duration_seconds
type summary
desc Time between establishing a connection to an upstream server and receiving the last byte of the response body
key upstream_response_time
<labels>
source ${tailed_path}
method ${verb}
status ${status}
</labels>
</metric>
</filter>
<source>
@type prometheus
</source>
<match *>
@type copy
<store>
@type "prometheus"
</store>
</match>
</ROOT>
2019-10-04 05:12:14 +0000 [info]: starting fluentd-1.7.0 pid=68 ruby="2.4.6"
2019-10-04 05:12:14 +0000 [info]: spawn command to main: cmdline=["/opt/td-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/td-agent", "--under-supervisor"]
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.4'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.1'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-kafka' version '0.11.1'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.6.0'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.5.0'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.0.1'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.2.0'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-s3' version '1.1.11'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-td' version '1.0.0'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.4'
2019-10-04 05:12:14 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.2.4'
2019-10-04 05:12:14 +0000 [info]: gem 'fluentd' version '1.7.0'
2019-10-04 05:12:14 +0000 [info]: adding filter pattern="nginxlog" type="prometheus"
2019-10-04 05:12:14 +0000 [info]: adding match pattern="*" type="copy"
2019-10-04 05:12:14 +0000 [info]: adding source type="tail"
2019-10-04 05:12:14 +0000 [info]: #0 Expanded the pattern %{IPORHOST:ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) %{QS:referrer} %{QS:user_agent} %{QS:fwd_for} (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) %{NUMBER:request_time} (?<upstream_response_time>[0-9-]+) into (?<ip>(?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|(?:(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))))|(?:\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)))) - (?<user>(?:[a-zA-Z0-9._-]+)) \[(?<timestamp>(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(?:\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b)/(?:(?>\d\d){1,2}):(?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9])) (?:(?:[+-]?(?:[0-9]+))))\] "(?:(?<verb>\b\w+\b) (?<request>\S+)(?: HTTP/(?<httpversion>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))))?|(?<rawrequest>.*?))" (?<status>[0-9]{3}) (?<body_bytes_sent>[0-9]+) (?<referrer>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<user_agent>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<fwd_for>(?:(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)))) (?<request_length>[0-9]+) (?<bytes_sent>[0-9]+) (?<request_time>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) (?<upstream_response_time>[0-9-]+)
2019-10-04 05:12:14 +0000 [info]: adding source type="prometheus"
2019-10-04 05:12:14 +0000 [info]: #0 starting fluentd worker pid=72 ppid=68 worker=0
2019-10-04 05:12:14 +0000 [info]: #0 fluentd worker is now running worker=0
root@e2b353d5951c:/# curl --verbose --output - localhost:24231/metrics
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 24231 (#0)
> GET /metrics HTTP/1.1
> Host: localhost:24231
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain; version=0.0.4
< Server: WEBrick/1.3.1 (Ruby/2.4.6/2019-04-01)
< Date: Fri, 04 Oct 2019 05:12:22 GMT
< Content-Length: 677
< Connection: Keep-Alive
<
# TYPE nginx_request_count counter
# HELP nginx_request_count Total number of requests received
# TYPE nginx_response_count counter
# HELP nginx_response_count Total number of response codes and number of responses sent per response code
# TYPE nginx_response_duration_seconds summary
# HELP nginx_response_duration_seconds Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
# TYPE nginx_response_upstream_duration_seconds summary
# HELP nginx_response_upstream_duration_seconds Time between establishing a connection to an upstream server and receiving the last byte of the response body
* Connection #0 to host localhost left intact
Thanks. Yes, but nothing was listening on 24231. However, it seems to be working now (listening on 24231) and I don't have an explanation of why it was not working earlier. I'll reopen this issue if it happens again.
OS details
td-agent details
The plugin was installed using
After td-agent starts, I can see that it's listening on 24230 on localhost (Was not the default 24231 earlier? I cannot see anything listening on 24231 here). Sending a curl request shows this as the output