Open philrz opened 3 years ago
@philrz I'm not quite clear on what this ticket calls for. Is this a mode for an analyzer process that would wait to start reading records until the process has successfully exited?
@mattnibs: Yes, that was the essence.
Reading the text again now, my filing it at the time was in some ways a reflection of Brimcap's new-ness and me not yet being completely familiar with its bells & whistles. Revisiting it again now that it's been around longer and we've documented it more fully, I don't see it as urgent. Perhaps most importantly, the Custom Brimcap Configuration article discloses a couple key points:
globs
parameter can be used to isolate files that have been post-processed and hence avoid the ones that are unsafe to tail while the analyzer is still running.As long as best practices are followed, it seems users could accomplish pretty much whatever they need without this option. Granted, if I use my imagination, I could see a future where it would still be handy. For instance, there's formats like Parquet that (as I understand it) can't be read until they're fully written. However, Brimcap doesn't have a way to directly import these formats right now (#80) so it's kind of moot.
If it's ok, I think I'll drop the MVP1 marker off this one but keep it open in the Deep Freeze so it's easy to find if a use case does surface again.
Repro is with Brimcap commit
1fa5fc4
and https://archive.wrccdc.org/pcaps/2018/wrccdc.2018-03-23.010014000000000.pcap.gz (uncompressed) as my test data.In my verifications steps #16 (comment), I first used this unsuccessful approach to try to work around https://redmine.openinfosecfoundation.org/issues/4106, mistakenly thinking that all I needed to do was leave behind only valid logs to be subject to Zed processing.
@mattnibs explained to me what went wrong here. The "ztail" functionality in Brimcap starts performing Zed processing on the logs generated by the analyzer processes even before those processes are finished, since this allows users to potentially perform early querying on partial output. Because of this, Brimcap ended up choking on the partially-built
eve.json
(which contains the duplicate field names) before my wrapper script had a chance to delete it.This led me to learn about and start using the
globs
parameter in the Brimcap config YAML such that the ztail would only tail thededuped-eve.json
file, so I was all set. However, having gone through the experience, I now recognized it would still be convenient to have a way to disable this ztail behavior entirely when processing an analyzer's generated logs, for two reasons I can think of:jq
lent itself to output you could still "tail", some kinds of post-processing may not be (e.g. they might rely on making an entire pass through a generated log after the complete output is present)