Closed praseodym closed 6 years ago
+1 This would be great, need the raw value.
For what it's worth, you can already achieve this behaviour by manually editing the ingest pipeline that is created by Filebeat.
That's what i did and it works. But after an update to i.e 6.x the default pipeline i.e "filebeat-6.0-nginx-access-default" will be created and used. Therefore it would be great to keep the raw value to be upstream compatible.
Could one provide more details about workaround solution? I tried to use custom regex rules (from https://www.elastic.co/guide/en/elasticsearch/plugins/6.3/using-ingest-user-agent.html#_using_a_custom_regex_file) but it seems they are ignored
That also happens with IIS module
With Filebeat 6.0.0-alpha2 on Debian Stretch, the nginx module uses the Elasticsearch ingest-user-agent plugin to parse user agent strings and then remove the raw value. Unfortunately, the ingest-user-agent plugin is not capable of parsing more exotic user agent strings, causing information loss.
Because I think I'd be a pointless exercise to have ingest-user-agent parse every single user agent string in existence, I'd suggest keeping the raw user agent string around instead. I've found this information useful to identify rogue scanners, e.g. we've had some cases of a foreign OpenVAS scanner hitting our server with thousands of requests in a short timespan, causing increased webserver load. Identifying requests from the scanner through access logs indexed by Filebeat was quite hard because of the loss of user agent string information.
For example, this nginx access log line:
gets indexed as follows, with all useful user agent information stripped away:
Edit: I think the same case can be made for the Apache2 module, but I have not tested it.