Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.34k stars 1.05k forks source link

Simplify regex notation handling #6473

Open zoulja opened 5 years ago

zoulja commented 5 years ago

Currently Graylog Pipelines (and maybe some other parts) require complicated manual escaping in regexes. Even my pattern works perfectly after 10 minutes debug with https://regex101.com/ then I have to spend 20 minutes more trying to guess what exactly Graylog doesn't like in Pipeline configuration window in something like regex_replace.

Suggestion: make regex engine frontend more user-friendly, in perfect case it must accept already verified patterns as is, without messing with manual escaping

CrackerJackMack commented 1 year ago

Bumping up. I also had to resort to some trickery in graylog 5.1. Same method too with regex101.com validation

rule "sanitize passwords" when
    contains(value: "ass", search: to_string($message.message), ignore_case: true) == true
then
    // Had to base64 encode it as it was braking parsing in some fashion
    // original regex:   [pP][Aa][Ss][Ss](?:[Ww][Oo][Rr][Dd])?[$=:\"]?[\":]?\s*[\"]?(\w+)[\"]?
    let pattern_base64 = "W3BQXVtBYV1bU3NdW1NzXSg/OltXd11bT29dW1JyXVtEZF0pP1skPTpcIl0/W1wiOl0/XHMqW1wiXT8oXHcrKVtcIl0/Cg==";
    let pattern = base64_decode(pattern_base64);
    set_field("message", 
        regex_replace(
            pattern: pattern, 
            value: to_string($message.message),
            replacement: "[REDACTED]",
            replace_all: true)
    );
end
CrackerJackMack commented 11 months ago

It's more than just regex, it appears to happen on complex GROK patterns too resulting in yet another base64 workaround. The UI breaking parsing prevents saving a valid, functioning pattern (grok or regex).

rule "extract nginx ingress controller log"
when
    has_field("application_name") && to_string($message.application_name) == "controller.ingress-nginx"
then
    // https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/log-format/
    // https://github.com/ChrsMark/beats/blob/194bb7be9271814e51883c25453277fd72f6f767/filebeat/module/nginx/ingress_controller/ingest/pipeline.yml
    let pat = "JXtJUDpyZW1vdGVfYWRkfSAtICV7VVNFUjpyZW1vdGVfdXNlcn0gXFsle0hUVFBEQVRFOnRpbWVfbG9jYWx9XF0gIiV7SFRUUF9NRVRIT0Q6cmVxdWVzdF9tZXRob2R9ICV7VVJJUEFUSFBBUkFNOnJlcXVlc3RfcGF0aH0gJXtIVFRQX1ZFUlNJT046aHR0cF92ZXJzaW9ufSIgJXtJTlQ6c3RhdHVzfSAle0lOVDpib2R5X2J5dGVzX3NlbnR9ICIle0dSRUVEWURBVEE6aHR0cF9yZWZlcnJlcn0iICIle0dSRUVEWURBVEE6aHR0cF91c2VyX2FnZW50fSIgJXtJTlQ6cmVxdWVzdF9sZW5ndGh9ICV7REVDSU1BTDpyZXF1ZXN0X3RpbWV9IFxbJXtIT1NUTkFNRTpwcm94eV91cHN0cmVhbV9uYW1lfVxdIFxbJXtHUkVFRFlEQVRBOlVOV0FOVEVEfVxdICV7SE9TVFBPUlQ6dXBzdHJlYW1fYWRkcn0gJXtJTlQ6dXBzdHJlYW1fcmVzcG9uc2VfbGVuZ3RofSAle0RFQ0lNQUw6dXBzdHJlYW1fcmVzcG9uc2VfdGltZX0gJXtJTlQ6dXBzdHJlYW1fc3RhdHVzfSAle1dPUkQ6cmVxdWVzdF9pZH0=";
    let match_on = base64_decode(pat);
    let results = grok(match_on, to_string($message.message), true);
    set_fields(results);
end

Working GROK pattern %{IP:remote_add} - %{USER:remote_user} \[%{HTTPDATE:time_local}\] "%{HTTP_METHOD:request_method} %{URIPATHPARAM:request_path} %{HTTP_VERSION:http_version}" %{INT:status} %{INT:body_bytes_sent} "%{GREEDYDATA:http_referrer}" "%{GREEDYDATA:http_user_agent}" %{INT:request_length} %{DECIMAL:request_time} \[%{HOSTNAME:proxy_upstream_name}\] \[%{GREEDYDATA:UNWANTED}\] %{HOSTPORT:upstream_addr} %{INT:upstream_response_length} %{DECIMAL:upstream_response_time} %{INT:upstream_status} %{WORD:request_id}

image image