fox-it / flow.record

Recordization library
GNU Affero General Public License v3.0
7 stars 9 forks source link

Add support for Splunk HTTP Event Collector #85

Closed MaxGroot closed 4 months ago

MaxGroot commented 11 months ago

This pull request adds support for the Splunk HTTP Event Collector (HEC) to the Splunk adapter. This is especially useful in scenarios where the already supported TCP Input is not available, for example in Splunk Cloud environments.

Sending data over HTTP Event Collector supports both sending the data in key=value format (like the TCP Data Input already uses) and a json format.

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

codecov[bot] commented 11 months ago

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (09ed812) 80.20% compared to head (a53bccc) 81.25%. Report is 1 commits behind head on main.

Files Patch % Lines
flow/record/adapter/splunk.py 92.68% 9 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #85 +/- ## ========================================== + Coverage 80.20% 81.25% +1.05% ========================================== Files 33 33 Lines 3132 3239 +107 ========================================== + Hits 2512 2632 +120 + Misses 620 607 -13 ``` | [Flag](https://app.codecov.io/gh/fox-it/flow.record/pull/85/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/fox-it/flow.record/pull/85/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it) | `81.25% <92.68%> (+1.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

MaxGroot commented 10 months ago

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,

In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)

This will currently yield the following input for Splunk:

netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"

As you can see, the type field is set twice, for two very different things. I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:

rdtype="my/record" rdtag=None type="some_value"

This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

Schamper commented 9 months ago

While working on the HTTP adapter it came to mind that there seems to be undesired behavior in the current Splunk adapter whenever a record has a field with the name type, as type is currently being used by the splunk adapter to denote the name property of the record's RecordDescriptor. For example,

In [1]: from flow.record import RecordDescriptor, RecordWriter
In [2]: descriptor = RecordDescriptor("my/record", [("string", "type")])
In [3]: record = descriptor(type="some_value")
In [4]: record
Out[4]: <my/record type='some_value'
In [6]: writer = RecordWriter("splunk://localhost:4444")
In [7]: writer.write(record)

This will currently yield the following input for Splunk:

netcat -lvp 4444
Listening on 0.0.0.0 4444
Connection received on localhost 1337
type="my/record" rdtag=None type="some_value"

As you can see, the type field is set twice, for two very different things. I think the Splunk adapter should to the same thing for type as it currently does for tag. The 'meta' field should be prefixed with rd, which would yield the following input in Splunk:

rdtype="my/record" rdtag=None type="some_value"

This however might break existing Splunk dashboards that build on top of disssect. In my experience records from dissect rarely have a 'type' field, which might explain why things haven't broken down in the past. Having said that, I would still consider the above (current) behavior to be a bug, and the introduction of the HTTP collector support of this PR requires me to either also implement the behavior in this PR, or fix it now.

Do you think the record descriptor name should be sent as rdtype rather than type?

And if so, should it be implemented in a separate PR first, or should I implement it within this PR?

Yes.

yunzheng commented 9 months ago

Something I'm still unsure of is whether to include the 'builtin' Record Fields such as _generated, _source, _classification and _version. Currently, the keyvalue format does not send these fields, but the json format does. This difference is caused by the usage of the JsonRecordPacker which includes these fields. I don't think it should be inconsistent between the two formats, but what is desired: including or omitting these builtin fields?

I would say _version is useless as the Splunk adapter has no reader support for deserialising it back using its original RecordDescriptor.

_generated could be useful to have imo.

_source can also be useful if people are using it. This is meant for extra metadata, for example you could put the disk image filename or path here which can be useful to determine where the record originated from.

_classification can also be useful if people are using it :) - eg: TLP:AMBER, TLP:CLEAR etc.

codecov-commenter commented 5 months ago

Codecov Report

Attention: Patch coverage is 94.57364% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 81.64%. Comparing base (9eb1557) to head (ce00f30).

Files Patch % Lines
flow/record/adapter/splunk.py 94.57% 7 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #85 +/- ## ========================================== + Coverage 80.54% 81.64% +1.10% ========================================== Files 34 34 Lines 3197 3307 +110 ========================================== + Hits 2575 2700 +125 + Misses 622 607 -15 ``` | [Flag](https://app.codecov.io/gh/fox-it/flow.record/pull/85/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/fox-it/flow.record/pull/85/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it) | `81.64% <94.57%> (+1.10%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fox-it#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.