[Check Point] GROK Filters add tag_on_failure

jamiehynds commented 1 year ago

Check Point integration is not leveraging tag_on_failure for the GROK filters. If there is a tag_on_failure for the GROK filters, then it could be used with a custom pipeline added to the integration configuration via processor on settings. Then could be used as a handle to reprocess source with custom GROK pattern(s).

Right now you need to modify or add to the existing GROK patterns in the pipeline asset of the integration. This will be overridden and lost with upgrade/reinstall of that integration. Whereas the custom pipeline would not.

There is a on_failure end filter in pipeline, but that is long text line and you would need to parse or do a logic like 'starts with xyz' .

Acceptance Criteria Add tag_on_failure on GROK filters in integration pipelines to use in logic of custom pipeline Or add manner to retain additional GROK pattern modification/adds to integration existing GROK filters without loss on update/reinstall of integration.

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

efd6 commented 1 year ago

The ES grok processor does not have a tag_on_failure option. This does exist for the logstash grok filter. However, we can use the on_failure option to append either to tags or error.message. Given that the failure would be a failure, the latter seems like a more appropriate approach.

Adding a failure error would have two uses, one is to provide a human-oriented message while the second would address the request above to be able to efficiently enable programmatic error recovery. Unfortunately these two uses are at odds (the former use would require that the user recover with approaches like the logic described above). An alternative would be to cover both approaches by appending both the human-directed message and a processor/failure-mode specific tag that would allow programmatic recovery. Another proposal that discusses this has been made here.

Taking the CheckPoint grok processor as an example I propose something like (original here)

  - grok:
      field: event.original
      tag: "grok_syslog_line"
      patterns:
        - '%{SYSLOG5424PRI}%{NONNEGINT:syslog5424_ver} +(?:%{TIMESTAMP}|-)
          +(?:%{IPORHOST:syslog5424_host}|-) +(-|%{SYSLOG5424PRINTASCII:syslog5424_app})
          +(-|%{SYSLOG5424PRINTASCII:syslog5424_proc}) +(?::-|%{SYSLOG5424PRINTASCII:syslog5424_msgid})
          +\[%{GREEDYDATA:syslog5424_sd}\]'
      pattern_definitions:
        TIMESTAMP: "%{TIMESTAMP_ISO8601:syslog5424_ts}(?:-?%{ISO8601_TIMEZONE:_temp_.tz})?"
        TIMESTAMP_ISO8601: "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?"
      on_failure:
        - append:
            field: error.message
            value: "fail-{{{ _ingest.on_failure_processor_tag }}}"
        - fail:
            message: "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message: {{ _ingest.on_failure_message }}"

and a change to the global on_failure processor to use append rather than set

on_failure:
  - append:
      field: error.message
      value: "{{ _ingest.on_failure_message }}"

With this addition, on attempting to ingest an invalid input like

<134>1 this_is_not_a_valid_year-03-30T07:20:35Z gw-da58d3 CheckPoint 7776 - [action:"Accept"; flags:"444676"; ifdir:"outbound"; ifname:"eth0"; logid:"0"; loguid:"{0x5e819dc3,0x0,0x353707c7,0xee78a1dc}"; origin:"192.168.1.100"; originsicname:"cn=cp_mgmt,o=gw-da58d3..tmn8s8"; sequencenum:"1"; time:"1594646954"; version:"5"; __policy_id_tag:"product=VPN-1 & FireWall-1[db_tag={880771B0-FD92-2C4F-82FC-B96FC3DE5A07};mgmt=gw-da58d3;date=1585502566;policy_name=Standard\]"; dst:"192.168.1.153"; inzone:"Local"; layer_name:"Network"; layer_uuid:"63b7fe60-76d2-4287-bca5-21af87337b0a"; match_id:"1"; parent_rule:"0"; rule_action:"Accept"; rule_uid:"1fde807b-6300-4b1a-914f-f1c1f3e2e7d2"; outzone:"External"; product:"VPN-1 & FireWall-1"; proto:"17"; s_port:"43103"; service:"514"; service_id:"syslog"; src:"192.168.1.100"]

we get an event like

{
    "_conf": {
        "tz_offset": "+0500"
    },
    "ecs": {
        "version": "8.6.0"
    },
    "error": {
        "message": [
            "fail-grok_syslog_line",
            "Processor grok with tag grok_syslog_line in pipeline default-1675904098718607000 failed with message: Provided Grok expressions do not match field value: [\u003c134\u003e1 this_is_not_a_valid_year-03-30T07:20:35Z gw-da58d3 CheckPoint 7776 - [action:\\\\\\\"Accept\\\\\\\"; flags:\\\\\\\"444676\\\\\\\"; ifdir:\\\\\\\"outbound\\\\\\\"; ifname:\\\\\\\"eth0\\\\\\\"; logid:\\\\\\\"0\\\\\\\"; loguid:\\\\\\\"{0x5e819dc3,0x0,0x353707c7,0xee78a1dc}\\\\\\\"; origin:\\\\\\\"192.168.1.100\\\\\\\"; originsicname:\\\\\\\"cn=cp_mgmt,o=gw-da58d3..tmn8s8\\\\\\\"; sequencenum:\\\\\\\"1\\\\\\\"; time:\\\\\\\"1594646954\\\\\\\"; version:\\\\\\\"5\\\\\\\"; __policy_id_tag:\\\\\\\"product=VPN-1 \u0026 FireWall-1[db_tag={880771B0-FD92-2C4F-82FC-B96FC3DE5A07};mgmt=gw-da58d3;date=1585502566;policy_name=Standard\\\\\\\\]\\\\\\\"; dst:\\\\\\\"192.168.1.153\\\\\\\"; inzone:\\\\\\\"Local\\\\\\\"; layer_name:\\\\\\\"Network\\\\\\\"; layer_uuid:\\\\\\\"63b7fe60-76d2-4287-bca5-21af87337b0a\\\\\\\"; match_id:\\\\\\\"1\\\\\\\"; parent_rule:\\\\\\\"0\\\\\\\"; rule_action:\\\\\\\"Accept\\\\\\\"; rule_uid:\\\\\\\"1fde807b-6300-4b1a-914f-f1c1f3e2e7d2\\\\\\\"; outzone:\\\\\\\"External\\\\\\\"; product:\\\\\\\"VPN-1 \u0026 FireWall-1\\\\\\\"; proto:\\\\\\\"17\\\\\\\"; s_port:\\\\\\\"43103\\\\\\\"; service:\\\\\\\"514\\\\\\\"; service_id:\\\\\\\"syslog\\\\\\\"; src:\\\\\\\"192.168.1.100\\\\\\\"]]"
        ]
    },
    "event": {
        "original": "\u003c134\u003e1 this_is_not_a_valid_year-03-30T07:20:35Z gw-da58d3 CheckPoint 7776 - [action:\"Accept\"; flags:\"444676\"; ifdir:\"outbound\"; ifname:\"eth0\"; logid:\"0\"; loguid:\"{0x5e819dc3,0x0,0x353707c7,0xee78a1dc}\"; origin:\"192.168.1.100\"; originsicname:\"cn=cp_mgmt,o=gw-da58d3..tmn8s8\"; sequencenum:\"1\"; time:\"1594646954\"; version:\"5\"; __policy_id_tag:\"product=VPN-1 \u0026 FireWall-1[db_tag={880771B0-FD92-2C4F-82FC-B96FC3DE5A07};mgmt=gw-da58d3;date=1585502566;policy_name=Standard\\]\"; dst:\"192.168.1.153\"; inzone:\"Local\"; layer_name:\"Network\"; layer_uuid:\"63b7fe60-76d2-4287-bca5-21af87337b0a\"; match_id:\"1\"; parent_rule:\"0\"; rule_action:\"Accept\"; rule_uid:\"1fde807b-6300-4b1a-914f-f1c1f3e2e7d2\"; outzone:\"External\"; product:\"VPN-1 \u0026 FireWall-1\"; proto:\"17\"; s_port:\"43103\"; service:\"514\"; service_id:\"syslog\"; src:\"192.168.1.100\"]"
    }
}

This would allow the user conditionally parse the event.original if (ctx.error?.message instance of List) && ctx.error.message.contains('fail-grok_syslog_line').

The checkpoint package has only a single grok processor, but in more complex pipelines, processors would be differentiated by their tag.

Note that the change proposed does a fail since there is no way to recover operation of the pipeline after the initial grok failure. There is also no way to make use of the subsequent processors and the user's custom processor pipeline would have to do all the necessary operations to construct an appropriately formatted document.

bczifra commented 1 year ago

I'd like to stress that I'd like to see my proposal, https://github.com/elastic/integrations/issues/5214, which was referenced in the comment above, implemented in all integration-generated processors, not just GROK. This is definitely a supportability issue.

efd6 commented 1 year ago

@bczifra I agree that the addition of error traceability is valuable (thanks for making the proposal), though I'm not convinced that a concrete "all" is necessarily the right answer. ~All failable processors seems like a more reasonable target; some processors are explicitly marked as non-failing or cannot fail and so addition to those would not be helpful. In any case, I'm using this issue as a model for testing out implementation of #5214.

bczifra commented 1 year ago

@efd6 agreed, I don't think it's even possible to add a tag to a processor that can't fail. I'm interpreting "can't fail" as "doesn't have an on_failure property". So indeed, all fail-able processors is correct.

efd6 commented 1 year ago

Yeah, there are two classes, those that can't fail because the processor does not have documented failure mode and those that are not required to not fail (i.e. those that have ignore_failure: true or equivalent).

elastic / integrations

[Check Point] GROK Filters add tag_on_failure #5108