Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.37k stars 1.06k forks source link

Incorrectly parsed CEF formatted messages containing mixed case field names #13881

Open djosip opened 1 year ago

djosip commented 1 year ago

A CEF formatted message containing field name such as cs1Label is not parsed correctly. In case the field name is renamed to lowercase, e.g. cs1label, everything works as expected.

I have to deal with a commercial firewall which produces CEF messages containing key-value pairs such as cs1Label=192.168.1.1. In the Elasticsearch index mapping, I get the field with name _192_168_11 instead of cs1Label. Since the IP address in the log messages is changing frequently, a number of fields in the Elasticsearch index mapping quickly reaches the default limit of 1000 fields per index. At that point, Graylog server logs messages such as this: message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_ exception, reason=Limit of total fields [1000] has been exceeded]]]

Expected Behavior

The key-value pair with mixed case letters in the field name such as cs1Label=192.168.1.1 should produce field name cs1Label instead of using the value as a field name.

Current Behavior

The key-value pair with mixed case letters in the field name such as cs1Label=192.168.1.1 produces a field name _192_168_11 which is incorrect and has a potential of flooding the Elasticsearch index mapping with useless fields, preventing other fields of being automatically added to the index mapping.

Possible Solution

This might be a bug in the CEF input code or the issue related to Elasticsearch behavior. With all lowercase field names everything works as expected. If there is no other way to fix the issue, it might be acceptable to put all field names to lowercase before processing them.

Steps to Reproduce (for bugs)

  1. Create CEF TCP input for testing purposes
  2. Create index set for testing purposes
  3. Create a test stream and make sure it gets log messages from the test CEF TCP input and stores them to the test index set created earlier
  4. Send a test message to the test CEF TCP input. Message could look like this (the important part is the field named cs1Label): echo 'Nov 14 20:22:57 hostname CEF:0|DeviceVendor|DeviceProduct|1.0|713|TCP Connection Abort|3|cat=512 gcat=6 cs1Label=192.168.1.1 spt=51713 dst=192.168.2.2 dpt=443 msg="TCP connection abort received; TCP connection dropped" fw_action="drop"' | nc <graylog_ip> <graylog_port>
  5. Check the index mapping for existence of the field _192_168_11 using Elasticsearch API: curl -s -XGET "http://localhost:9200/<test index name>/_mapping?pretty"

Context

I was trying to send CEF formatted log messages from firewall to Graylog CEF TCP input. I solved the problem using rsyslog in front of the Graylog. I wrote a rsyslog template which puts the whole message to lowercase. The template looks like this: template(name="cef" type="string" string="%timereported% %HOSTNAME% %syslogtag%%msg:::lowercase,drop-last-lf%\n")

Your Environment

rich-graylog commented 1 year ago

This has been resolved in 4.3.9 ( #13812 )

djosip commented 1 year ago

I have checked the issue #13812. It doesn't seem that there have been changes related to this bug report.

I have upgraded to Graylog 4.3.9 and the the issue still persists. Please, reopen the issue.

rich-graylog commented 1 year ago

My apologies! I got this conflated with another issue - I will re-open and we will take a look

djosip commented 1 year ago

Thank you.

kingzacko1 commented 1 year ago

@djosip So I've attempted to recreate this issue locally, and I do not get the 192_168_1_1 field name in my index. I don't get anything regarding that key-pair when I send the cs1Label key in.

However, I do notice that in our code we are specifically skipping keys that end with the case-sensitive string Label. See this code. I am not super familiar with the CEF parser so I don't know why we do that exactly, but I will ask the people who are tomorrow. But the fact we are skipping parsing those fields altogether makes me wonder if there is something else manipulating the message, maybe a pipeline rule, that might be causing the issue.

ryan-carroll-graylog commented 1 year ago

@kingzacko1 I was able to recreate this by adding the cs1 field to the original test message.

What appears to be happening in the code you specified is that we're taking the value of cs1Label and using it as the field name for the value of cs1 then dropping cs1Label. So if our original message has cs1=some_value and cs1Label=some_label, we end up processing the message as some_label=some_value.

As you stated it is clearly intentional, but I'm not familiar enough with CEF either to determine if it's the correct behavior. I just wanted to add the context I found before I saw you were working on this.

kingzacko1 commented 1 year ago

@ryan-carroll-graylog ah, yeah that was it. Adding the cs1=value in the message results in the 192_168_1_1 field being indexed. Probably need to do some digging and see if that is a CEF thing or why our code handles it like that.

djosip commented 1 year ago

I just went through the Micro Focus Security ArcSight Common Event Format document and it looks like this isn't a CEF thing. CEF looks pretty simple in that regard.

Keys cannot contain some characters like space or =. A value can contain multiple leading or trailing spaces except the last value which shouldn't contain trailing space. A value can also contain newlines and character = which needs to be to escaped (e.g. \=).

So, I don't see why would key such as cs1Label be treated any differently. If anything, one might want to make sure that cs1Label key contains a string limited to the length of 1023 (I assume bytes, not characters) but as long as Graylog is concerned I believe it doesn't make much sense to fully adhere to the CEF standard and implement ArcSight Extension Dictionary. I am not even sure that the linked document represents the CEF standard.

tristanlatr commented 1 year ago

It seems to me that the custom strings (and other custom fields) declaration is part of the CEF standard: https://docs.centrify.com/Content/IntegrationContent/SIEM/arcsight-cef/arcsight-cef-format.htm