Open dcode opened 6 years ago
Sorry for the late response here @dcode — yes, that is true. This was actually our original idea, until we noticed that Zeek currently lacks a good way to generalize log modifications to a set of applicable logs. You'd currently need to redef each applicable log's Info
record. That's certainly doable but not very elegant.
Out of curiosity, in which log would you most like to have this?
That's not entirely true. Zeek's logging framework does have a generalized mechanism. The current limitation is that when the mechanism is used to extend all logs, you can't inspect the content of the log in question in order to decide what you put in the extension field.
If we solved that problem, this would be easily doable.
What mechanism do you mean? In filters or the writer?
There is a feature in the logging framework to create log extension fields which are globally applied across all logs. Some common things people use it for are to add the worker that wrote the log line and the timestamp when the log line was written. The unfortunate part and what mostly limits it's utility is that you can't access the record being logged in the callback function that this feature is implemented through. If you could access the logged record in some way, you could inspect it to see if it has an "id" field and check it's type and then do some extended informational logging (like writing the community_id).
Here is the function prototype for the log extension mechanism: https://github.com/bro/bro/blob/master/scripts/base/frameworks/logging/main.bro#L150
Here is a short example using it globally...
type LogExtension: record {
path: string &log;
system_name: string &log;
write_ts: time &log;
};
function add_log_extension(path: string): LogExtension
{
return LogExtension($path = path,
$system_name = peer_description,
$write_ts = network_time());
}
redef Log::default_ext_prefix = "_";
redef Log::default_ext_func = add_log_extension;
Ideally, the Log::default_ext_func function would have a second argument that is an anonymous record and Zeek would give you the ability to inspect anonymous records.
@sethhall, I've used that log extension before and that's the first thing I thought of, but without the log record, it doesn't fit this usecase.
Ah, right! I now remember reading over this and going "huh", but it was too early in my logging framework career. :smile: Thanks for the cluebatting!
This seems to have a deficiency in that there's no particularly graceful way of handling the presence of multiple such global functions, but that will be easy to fix.
It's a pity though that this moves pretty far from the type-oriented extension mechanism (via redef) that we have elsewhere ... ideally I'd want a conditional redef for adding to the Info
record, depending on what else has been added to it. The fact that this is about logging-related records would then become secondary. We tried if ( record_fields(x) )
— it's nearly there but doesn't quite work right atm.
Definitely fun stuff!
@dcode Yeah, the original intent of the log extension mechanism was to figure out a way to get access to the log record but we couldn't do it at the time. We might be able to revisit that now with some new features that have been added to Zeek.
Hello there!
Was wondering what the status of this issue is? Any progress?
I'd still like to have a way to do this in a controlled yet general way in the logging framework. But others have put in the elbow grease to do it manually for all of Zeek's logs, see here if you prefer that: https://github.com/DynamiteAI/publish-community_id
@ckreibich Has anything changed since your last comment in 2020? Thanks!
I'm afraid no. There are two potential approaches, and we've not found the time for either:
conn_id
extensible so the ID could become part of it, and thus automatically appear in the logsext_func
approach more context about the log it is extending, so it can tuck the ID on where needed.Duly noted though that folks still want to see this capability. :+1:
@dougburks would piping the logs through an external tool that adds the extra column be an option?
@mavam For logs that don't already have community_id, we can enrich them using Elastic's community_id processor. However, in some cases, that processor doesn't find all of the information it's looking for (like network transport) and so it doesn't calculate a community_id value. So we're going back to first principles and examining our entire pipeline to see if there are improvements we can make to Zeek or Elastic to improve our community_id coverage.
Okay, so the community_id
processor from Elastic would do the trick if it could use the right protocol, like tcp
, udp
, icmp
, etc.? It sounds like that this information is only available in conn.log and that otherwise you'd have to guess it based off the log type (e.g., tcp
for http.log because others don't make sense). I'm not sure if you can express this log-type dispatching with Elastic though.
@mavam Yes, that's correct. We've considered updating our Elastic ingest parsers to set protocol where necessary, but from an overall architecture perspective it feels like all of this really should happen at the Zeek level.
@ckreibich Given the two potential approaches you outlined above, would it be possible for us to sponsor your time to make this happen? If so, who would I talk to about that?
Doug I don't think you can really sponsor my time for this, but you also don't have to — we're about to plan content for the 7.1 release and knowing that this is so desirable for you certainly matters.
I'd also like to understand this a bit better. Is the main reason you'd like to have this convenience (i.e., doing away with the need to pivot via conn.log), or is there more? Convenience is certainly valid, but I'm trying to understand if there's a use case where the ID has to be in more logs.
@ckreibich That's good news! Thanks for being willing to look into this!
Avoiding the pivot via conn.log is definitely a valid concern, although personally I'd classify that as more than just a convenience. As a threat hunter or incident responder, if I'm constantly having to do 2 pivots all day long then it's naturally slowing me down and limiting the number of bad guys I can catch.
Another point to consider is that we give our users the option to use either Zeek or Suricata for metadata. When using Suricata for metadata, all of the metadata logs automatically contain community_id (without pivoting to another log). For folks comparing Zeek and Suricata to determine which they want to use for metadata, it may be somewhat surprising that Zeek doesn't have any easy option for this today...especially considering who developed community_id. :smiley: Implementing this feature would help to level that playing field.
Thanks again for your consideration!
That's the reason this has been requested by my users, is just cutting down that extra step in pivoting: pivoting between Zeek logs but also pivoting between Zeek logs and other tools' logs (for example, today to pivot from zeek's http.log to the corresponding Arkime session, it's http.log -> conn.log -> arkime session, when we could cut out that middle step). Not every user would want to add community_id to all the logs, but I certainly know some that would.
Thanks folks, got it!
If any log has 5-tuple information, it should contain the
community_id
field for correlation across data types. As it stands today, one lookup has to find theconn
entry, and another lookup to find related logs.