corelight / zeek-community-id

Zeek support for Community ID flow hashing.
https://github.com/corelight/community-id-spec
Other
33 stars 18 forks source link

Feature Request: Add community_id to all network log types #3

Open dcode opened 6 years ago

dcode commented 6 years ago

If any log has 5-tuple information, it should contain the community_id field for correlation across data types. As it stands today, one lookup has to find the conn entry, and another lookup to find related logs.

ckreibich commented 5 years ago

Sorry for the late response here @dcode — yes, that is true. This was actually our original idea, until we noticed that Zeek currently lacks a good way to generalize log modifications to a set of applicable logs. You'd currently need to redef each applicable log's Info record. That's certainly doable but not very elegant.

Out of curiosity, in which log would you most like to have this?

sethhall commented 5 years ago

That's not entirely true. Zeek's logging framework does have a generalized mechanism. The current limitation is that when the mechanism is used to extend all logs, you can't inspect the content of the log in question in order to decide what you put in the extension field.

If we solved that problem, this would be easily doable.

ckreibich commented 5 years ago

What mechanism do you mean? In filters or the writer?

sethhall commented 5 years ago

There is a feature in the logging framework to create log extension fields which are globally applied across all logs. Some common things people use it for are to add the worker that wrote the log line and the timestamp when the log line was written. The unfortunate part and what mostly limits it's utility is that you can't access the record being logged in the callback function that this feature is implemented through. If you could access the logged record in some way, you could inspect it to see if it has an "id" field and check it's type and then do some extended informational logging (like writing the community_id).

Here is the function prototype for the log extension mechanism: https://github.com/bro/bro/blob/master/scripts/base/frameworks/logging/main.bro#L150

Here is a short example using it globally...

type LogExtension: record {
        path:   string &log;
        system_name:   string &log;
        write_ts: time   &log;
};

function add_log_extension(path: string): LogExtension
        {
        return LogExtension($path        = path,
                                     $system_name = peer_description,
                                     $write_ts    = network_time());
        }
redef Log::default_ext_prefix = "_";
redef Log::default_ext_func = add_log_extension;

Ideally, the Log::default_ext_func function would have a second argument that is an anonymous record and Zeek would give you the ability to inspect anonymous records.

dcode commented 5 years ago

@sethhall, I've used that log extension before and that's the first thing I thought of, but without the log record, it doesn't fit this usecase.

ckreibich commented 5 years ago

Ah, right! I now remember reading over this and going "huh", but it was too early in my logging framework career. :smile: Thanks for the cluebatting! This seems to have a deficiency in that there's no particularly graceful way of handling the presence of multiple such global functions, but that will be easy to fix. It's a pity though that this moves pretty far from the type-oriented extension mechanism (via redef) that we have elsewhere ... ideally I'd want a conditional redef for adding to the Info record, depending on what else has been added to it. The fact that this is about logging-related records would then become secondary. We tried if ( record_fields(x) ) — it's nearly there but doesn't quite work right atm. Definitely fun stuff!

sethhall commented 5 years ago

@dcode Yeah, the original intent of the log extension mechanism was to figure out a way to get access to the log record but we couldn't do it at the time. We might be able to revisit that now with some new features that have been added to Zeek.

defensivedepth commented 4 years ago

Hello there!

Was wondering what the status of this issue is? Any progress?

ckreibich commented 4 years ago

I'd still like to have a way to do this in a controlled yet general way in the logging framework. But others have put in the elbow grease to do it manually for all of Zeek's logs, see here if you prefer that: https://github.com/DynamiteAI/publish-community_id

dougburks commented 3 months ago

@ckreibich Has anything changed since your last comment in 2020? Thanks!

ckreibich commented 3 months ago

I'm afraid no. There are two potential approaches, and we've not found the time for either:

Duly noted though that folks still want to see this capability. :+1:

mavam commented 3 months ago

@dougburks would piping the logs through an external tool that adds the extra column be an option?

dougburks commented 3 months ago

@mavam For logs that don't already have community_id, we can enrich them using Elastic's community_id processor. However, in some cases, that processor doesn't find all of the information it's looking for (like network transport) and so it doesn't calculate a community_id value. So we're going back to first principles and examining our entire pipeline to see if there are improvements we can make to Zeek or Elastic to improve our community_id coverage.

mavam commented 3 months ago

Okay, so the community_id processor from Elastic would do the trick if it could use the right protocol, like tcp, udp, icmp, etc.? It sounds like that this information is only available in conn.log and that otherwise you'd have to guess it based off the log type (e.g., tcp for http.log because others don't make sense). I'm not sure if you can express this log-type dispatching with Elastic though.

dougburks commented 3 months ago

@mavam Yes, that's correct. We've considered updating our Elastic ingest parsers to set protocol where necessary, but from an overall architecture perspective it feels like all of this really should happen at the Zeek level.

@ckreibich Given the two potential approaches you outlined above, would it be possible for us to sponsor your time to make this happen? If so, who would I talk to about that?

ckreibich commented 3 months ago

Doug I don't think you can really sponsor my time for this, but you also don't have to — we're about to plan content for the 7.1 release and knowing that this is so desirable for you certainly matters.

I'd also like to understand this a bit better. Is the main reason you'd like to have this convenience (i.e., doing away with the need to pivot via conn.log), or is there more? Convenience is certainly valid, but I'm trying to understand if there's a use case where the ID has to be in more logs.

dougburks commented 3 months ago

@ckreibich That's good news! Thanks for being willing to look into this!

Avoiding the pivot via conn.log is definitely a valid concern, although personally I'd classify that as more than just a convenience. As a threat hunter or incident responder, if I'm constantly having to do 2 pivots all day long then it's naturally slowing me down and limiting the number of bad guys I can catch.

Another point to consider is that we give our users the option to use either Zeek or Suricata for metadata. When using Suricata for metadata, all of the metadata logs automatically contain community_id (without pivoting to another log). For folks comparing Zeek and Suricata to determine which they want to use for metadata, it may be somewhat surprising that Zeek doesn't have any easy option for this today...especially considering who developed community_id. :smiley: Implementing this feature would help to level that playing field.

Thanks again for your consideration!

mmguero commented 3 months ago

That's the reason this has been requested by my users, is just cutting down that extra step in pivoting: pivoting between Zeek logs but also pivoting between Zeek logs and other tools' logs (for example, today to pivot from zeek's http.log to the corresponding Arkime session, it's http.log -> conn.log -> arkime session, when we could cut out that middle step). Not every user would want to add community_id to all the logs, but I certainly know some that would.

ckreibich commented 3 months ago

Thanks folks, got it!