elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.12k forks source link

Question: Logstash problem with grok filters parsing apache-access and nginx-access at the same time? #707

Closed KenChimp closed 10 years ago

KenChimp commented 10 years ago

Relatively new to Logstash/ElasticSearch/Redis with Kibana UI Using: Logstash-1.1.13 Elasticsearch-0.90.3 Redis-2.6.16 Kibana 3, milestone 3

Shipping logs to central logs host using rsyslog on CentOS 6.4 Configs (modified for privacy): logstash shipper.conf: input { tcp { port => 5140 type => syslog } udp { port => 5140 type => syslog } }

filter { grok { type => "syslog" pattern => [ "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" ] add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{@source_host}" ] } syslog_pri { type => "syslog" } date { type => "syslog" match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } mutate { type => "syslog" exclude_tags => "_grokparsefailure" replace => [ "@source_host", "%{syslog_hostname}" ] replace => [ "@message", "%{syslog_message}" ] } mutate { type => "syslog" remove => [ "syslog_hostname", "syslog_message", "syslog_timestamp" ] } } filter { grok { type => "syslog" match => [ "syslog_program", "apache-access" ] pattern => "%{COMBINEDAPACHELOG}" } } filter { grok { type => "syslog" match => [ "syslog_program", "nginx-access" ] pattern => [ "%{IP:client_ip} [%{HTTPDATE:time}] %{HOST:domain} \"(?:%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}|%{DATA:unparsedrq})\" %{NUMBER:response} %{NUMBER:bytes} (?:%{NUMBER:bytes}|-) \"%{QUOTEDSTRING:httpreferrer}\" \"%{QUOTEDSTRING:httpuseragent}\" \"(?([0-9],. ]+?)|-)\" (?([0-9,. ]+?)|-) (%{BASE16FLOAT:request_time}|-) \"(?([\w\W]+?)|-)\"" ] add_field => [ "nginx_response", "%{NUMBER:response}" ] } }

output { redis { host => "10.6.1.76" data_type => "list" key => "logstash" } }

logstash indexer.conf: input { redis { host => "10.6.1.76" type => "redis-input" data_type => "list" key => "logstash"

format => "json_event"

} }

output { elasticsearch { host => "10.6.1.76" cluster => "Monkey_elasticsearch" } }

Using Kibana 3 UI in apache2 on central logs host, called with:

/usr/bin/java -jar logstash-1.1.13-flatjar.jar web --backend elasticsearch://10.6.1.76/Monkey_elasticsearch

Problem Description: I am able to receive apache access log data just fine if I change the nginx-access pattern match to %{COMBINEDAPACHELOG}, but then my nginx access logs are not filtered properly and I can't easily get response codes, etc in Kibana for nginx access log data.

If I set up a pattern match for nginx access logs in addition to the apache access log pattern (exactly as in my shipper.conf above), I no longer see ANY apache access logs, and nginx access logs are not properly filtered so I can easily see/retrieve the fields, even if I configure fields (as I did with nginx_response in my shipper.conf above).

The problem is not syntactical, as I get no errors on starting up logstash processes with the configurations above.

Any idea what I'm doing wrong?

KenChimp commented 10 years ago

I have resolved this issue. First step was to learn how to write new field entries for pattern matching in grok, then use [url=http://grokdebug.herokuapp.com/]Grok Debugger[/url] to correct any logic or syntax problems.

I ended up with this for my logstash shipper.conf: shipper.conf: input { tcp { port => 5140 type => syslog } udp { port => 5140 type => syslog } }

filter { grok { break_on_match => "false" type => "syslog" pattern => [ "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" ] add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{@source_host}" ] } syslog_pri { type => "syslog" } date { type => "syslog" match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } mutate { type => "syslog" exclude_tags => "_grokparsefailure" replace => [ "@source_host", "%{syslog_hostname}" ] replace => [ "@message", "%{syslog_message}" ] } mutate { type => "syslog" remove => [ "syslog_hostname", "syslog_message", "syslog_timestamp" ] } } filter { grok { break_on_match => "false" type => "syslog" match => [ "syslog_program", "apache-access", "syslog_program", "apache-error" ] pattern => "%{COMBINEDAPACHELOG}" } } filter { grok { break_on_match => "false" type => "syslog" match => [ "syslog_program", "nginx-access", "syslog_program", "nginx-error" ] pattern => [ "%{IP:client_ip} [%{HTTPDATE:time}] \"%{HOST:domain}\" \"(?:%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}|%{DATA:unparsedrq})\" %{NUMBER:response} (%{NUMBER:body_bytes}|-) (%{NUMBER:bytes}|-) %{DATA:unparsedrq}" ] } }

output { redis { host => "10.6.1.76" data_type => "list" key => "logstash" } }

And also adding a line to load the emelasticsearch module to rsyslog.conf, which I somehow neglected to do before.