Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.44k stars 1.07k forks source link

Processing Pipeline `parse_cef` / CEF Input - parse broken message #5781

Open jalogisch opened 5 years ago

jalogisch commented 5 years ago

Expected Behavior

When using the parse_cef function or the CEF Input it should work for all ingested messages and extract the key-value out of the message.

Current Behavior

Using parse_cef (and ingest CEF messages) takes every = and split that into key-value even if the = is part of the value of one key:

grafik

Possible Solution

If you use the key_value function it works flawless:

key_value(value: to_string(message), trim_value_chars: "\"", trim_key_chars:"\"", delimiters:" ", kv_delimiters:"=");

Maybe the above can be used for the key-value extraction of CEF messages.

rule "Isolate CEF"
when
    has_field("message") 
then

    let result = regex(("(CEF:.*)$"),to_string($message.message));
    set_field("pure_cef", result["0"]);
 end 
rule "Parse CEF"
    when
        has_field("pure_cef")
    then
        set_fields(parse_cef(to_string($message.pure_cef), false));
end
rule "Cleanup CEF"
    when
        has_field("pure_cef")
    then
        set_field("message", $message.pure_cef);
            remove_field("pure_cef");
end

The following message parses cleanly by the default CEF Parserver - no addition = is part of the msg key

CEF:0|Trend Micro Inc.|OSSEC HIDS|v2.9.0|5715|SSHD authentication success.|3|dvc=graylog01 cs1=(srvapp03) 10.130.100.83->/var/log/secure cs1Label=Location classification= syslog,sshd,authentication_success, src=10.130.102.29 shost=10.130.102.29 suser=bogner suser=bogner msg=Mar 15 08:42:07 srvApp03.graylog.lan sshd[40309]: Accepted password for bogner from 10.130.102.29 port 50348 ssh2

The following message parses not by the default CEF Parser - because the msg field contains = in it. The above key_value configuration does parese it cleanly.

CEF:0|Trend Micro Inc.|OSSEC HIDS|v2.9.0|31104|Common web attack.|6|dvc=graylog01 cs1=(srvws01) 10.130.102.28->/var/log/httpd/access_log cs1Label=Location classification= web,accesslog,attack, src=10.130.104.25 shost=10.130.104.25 msg=10.130.104.25 - - [15/Mar/2019:05:17:21 -0500] "GET /directorypro.cgi?want=showcat&show=../../../../../etc/passwd%00 HTTP/1.1" 302 102 "-" "-"

Context

Graylog does the parsing of CEF Messages right, but the Vendors does not. So we could try to implement the above mentioned fix to be able to parse more messages.

Your Environment

jalogisch commented 5 years ago

the following pipeline rule work around

rule "split and parse"
when
    has_field("pure_cef")
then
    let message=split("\\|", to_string($message.pure_cef));

    // split parts out of CEF
    set_field("device_vendor", message[1]);
    set_field("device_product", message[2]);
    set_field("device_version", message[3]);
    set_field("device_event_class_id", message[4]);
    set_field("name", message[5]);
    set_field("severity", message[6]);
    set_field("device_message", message[7]);

    //parse k-v message
    let kv = key_value(value: to_string(message[7]), trim_value_chars: "\"", trim_key_chars:"\"", delimiters:" ", kv_delimiters:"=");
    set_fields(kv);

    let gmsg = grok(pattern: "%{GREEDYDATA}msg=%{GREEDYDATA:message}", value: to_string(message[7]), only_named_captures: true);
    set_fields(gmsg);

    // cleanup
    remove_field("pure_cef");
    remove_field("device_message");
    remove_field("msg");
end
kroepke commented 5 years ago

Isn't this something that should be fixed in the CEF input? What does CEF say about escaping = characters?

kroepke commented 5 years ago

To answer my own question, according to https://community.microfocus.com/t5/ArcSight-Connectors/ArcSight-Common-Event-Format-CEF-Implementation-Standard/ta-p/1645557?attachment-id=68077 values need to escape = characters in extensions.

jalogisch commented 5 years ago

My rule fix the TrendMicro Devices not following the CEF Rules. That is the only I have seen so far that just place a common syslog message in the msg field without any kind of escaping. Also msg is always the last key-value pair.

That is the reason the above is working, other devices might not be that broken.

Maybe we can add some kind of option to the CEF Input that does the above?

root-locus commented 5 years ago

JFYI, We faced the same problem with McAfee Network Security Monitoring solution. They provide custom template for syslog messages, but there is no configuration for escaping.

For example, Palo Alto provides such configuration with options "Escaped Characters" and "Escape Character": https://docs.paloaltonetworks.com/content/dam/techdocs/en_US/pdf/cef/pan-os-80-cef-configuration-guide.pdf

We'll raise a support ticket with McAfee, but these options also might be a good idea for CEF plugin to implement.

grownuphacker commented 5 years ago

Just dropping in for a massive thanks in the time-save for the Trend Micro parse work. It took some massaging for the 2019 Apex XYZ products but the principals remained. Thanks