Graylog2 / graylog-plugin-integrations

A collection of open source Graylog integrations that will be released together.
Other
14 stars 14 forks source link

Source field becomes incorrect when ingesting netflow data to graylog docker container #709

Open sjwk opened 3 years ago

sjwk commented 3 years ago

Expected Behavior

Source field should be correct and show the IP of the Netflow source

Current Behavior

TLDR: In graylog container, something triggers the netflow input to switch from storing the correct source field to storing the internal gateway IP for the docker network the container is in.

I'm running graylog, elasticsearch and mongo as containers from a docker-compose file more or less the same as that given in the documentation.
I have a Netflow input set up, override_source is not set. Into that I am feeding Netflow v9 records from two different systems, running different netflow probe software (a Linux box running netprobe and a BSD box running softflowd). All is well, and the data is correct. At some point, something happens, and the source field in the data changes from the IP of the netflow probe machine to the IP of the internal gateway of the docker network. This happens to data from both netflow probe machines.

I believe from a small amount of testing that the 'something' that triggers it is the restart of the docker container, whether by the host updating docker or stopping/starting the container. That certainly seemed to cause it to start to use the docker host's IP as the source rather than the actual IPs but there may be other triggers. Once in that state, restarting the probe software on the remote boxes seems to be the only way to get it to start logging the correct source field again.

The issue does seem to be specific to the Netflow module, other inputs (beats, syslog) are working fine. I can't currently test whether there's any way to trigger the same issue on a non-container version.

Context

It's problematic if I want to search based on which box sent netflow data, or to apply pipeline rules to normalize the data coming from the two sources if I relied on that obvious field to tell the sources apart.

Your Environment

mfz0r commented 3 years ago

Any update on this? Can confirm this exact issue is happening in my environment

sjwk commented 3 years ago

Is there any further information I can provide to try and come up with a fix?