Open djosip opened 1 year ago
This has been resolved in 4.3.9 ( #13812 )
I have checked the issue #13812. It doesn't seem that there have been changes related to this bug report.
I have upgraded to Graylog 4.3.9 and the the issue still persists. Please, reopen the issue.
My apologies! I got this conflated with another issue - I will re-open and we will take a look
Thank you.
@djosip So I've attempted to recreate this issue locally, and I do not get the 192_168_1_1
field name in my index. I don't get anything regarding that key-pair when I send the cs1Label
key in.
However, I do notice that in our code we are specifically skipping keys that end with the case-sensitive string Label
. See this code. I am not super familiar with the CEF parser so I don't know why we do that exactly, but I will ask the people who are tomorrow. But the fact we are skipping parsing those fields altogether makes me wonder if there is something else manipulating the message, maybe a pipeline rule, that might be causing the issue.
@kingzacko1 I was able to recreate this by adding the cs1
field to the original test message.
What appears to be happening in the code you specified is that we're taking the value of cs1Label
and using it as the field name for the value of cs1
then dropping cs1Label
. So if our original message has cs1=some_value
and cs1Label=some_label
, we end up processing the message as some_label=some_value
.
As you stated it is clearly intentional, but I'm not familiar enough with CEF either to determine if it's the correct behavior. I just wanted to add the context I found before I saw you were working on this.
@ryan-carroll-graylog ah, yeah that was it. Adding the cs1=value
in the message results in the 192_168_1_1
field being indexed. Probably need to do some digging and see if that is a CEF thing or why our code handles it like that.
I just went through the Micro Focus Security ArcSight Common Event Format document and it looks like this isn't a CEF thing. CEF looks pretty simple in that regard.
Keys cannot contain some characters like space or =
.
A value can contain multiple leading or trailing spaces except the last value which shouldn't contain trailing space.
A value can also contain newlines and character =
which needs to be to escaped (e.g. \=
).
So, I don't see why would key such as cs1Label
be treated any differently.
If anything, one might want to make sure that cs1Label
key contains a string limited to the length of 1023 (I assume bytes, not characters) but as long as Graylog is concerned I believe it doesn't make much sense to fully adhere to the CEF standard and implement ArcSight Extension Dictionary.
I am not even sure that the linked document represents the CEF standard.
It seems to me that the custom strings (and other custom fields) declaration is part of the CEF standard: https://docs.centrify.com/Content/IntegrationContent/SIEM/arcsight-cef/arcsight-cef-format.htm
A CEF formatted message containing field name such as cs1Label is not parsed correctly. In case the field name is renamed to lowercase, e.g. cs1label, everything works as expected.
I have to deal with a commercial firewall which produces CEF messages containing key-value pairs such as cs1Label=192.168.1.1. In the Elasticsearch index mapping, I get the field with name _192_168_11 instead of cs1Label. Since the IP address in the log messages is changing frequently, a number of fields in the Elasticsearch index mapping quickly reaches the default limit of 1000 fields per index. At that point, Graylog server logs messages such as this:
message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_ exception, reason=Limit of total fields [1000] has been exceeded]]]
Expected Behavior
The key-value pair with mixed case letters in the field name such as cs1Label=192.168.1.1 should produce field name cs1Label instead of using the value as a field name.
Current Behavior
The key-value pair with mixed case letters in the field name such as cs1Label=192.168.1.1 produces a field name _192_168_11 which is incorrect and has a potential of flooding the Elasticsearch index mapping with useless fields, preventing other fields of being automatically added to the index mapping.
Possible Solution
This might be a bug in the CEF input code or the issue related to Elasticsearch behavior. With all lowercase field names everything works as expected. If there is no other way to fix the issue, it might be acceptable to put all field names to lowercase before processing them.
Steps to Reproduce (for bugs)
echo 'Nov 14 20:22:57 hostname CEF:0|DeviceVendor|DeviceProduct|1.0|713|TCP Connection Abort|3|cat=512 gcat=6 cs1Label=192.168.1.1 spt=51713 dst=192.168.2.2 dpt=443 msg="TCP connection abort received; TCP connection dropped" fw_action="drop"' | nc <graylog_ip> <graylog_port>
curl -s -XGET "http://localhost:9200/<test index name>/_mapping?pretty"
Context
I was trying to send CEF formatted log messages from firewall to Graylog CEF TCP input. I solved the problem using rsyslog in front of the Graylog. I wrote a rsyslog template which puts the whole message to lowercase. The template looks like this:
template(name="cef" type="string" string="%timereported% %HOSTNAME% %syslogtag%%msg:::lowercase,drop-last-lf%\n")
Your Environment