brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

zqd: Support archive space pcap ingest support #1132

Closed mattnibs closed 4 years ago

alfred-landrum commented 4 years ago

We ran an into issue when discussing how to handle this in the "drag a pcap" import of Brim: the "long conn" problem of Zeek. That is: if you feed a pcap to the zeek runner, and occasionally take a snapshot of the zeek logs, you will occasionally see timestamps prior to a previous snapshot, since the timestamp for a zeek conn record is the connection start, but the record is emitted at connection end. (@mattnibs : please make sure I'm describing this accurately).

We want to feed snapshots of the transformed zeek logs into the archive, so that a Brim user can examine & query the available data without waiting for the entire pcap to be read. We can do that by taking a snapshot of the logs, converting them to zng, then import the delta of records from the last snapshots zng file (so that we don't import duplicate records). The new records may have timestamps that overlap with existing imported data - which we don't yet handle. This led to the discussion & design in #1183 .

philrz commented 4 years ago

Verified with Brim commit 480f4ca that uses zq tooling at commit 51a0a44.

As of the last GA combination of Brim tagged v0.18.0 with zq tagged v0.22.0, importing a pcap to an archive Space was not possible. With my Brim app running and hence the zqd also started:

$ zapi -version
Version: v0.22.0

$ zapi new -k archivestore mypcap
mypcap: space created

$ zapi -s mypcap pcappost hello.pcapng
posting...
status code 400: space does not support pcap import

Now with Brim commit 480f4ca with zq tooling at commit 51a0a44:

$ zapi -version
Version: v0.22.0-53-g51a0a44

$ zapi new -k archivestore mypcap
mypcap: space created

$ zapi -s mypcap pcappost hello.pcapng
100.0% 12.40KB/12.40KB
/Users/phil/pcap/hello.pcapng: pcap posted

At this point, as shown in the attached video, I can "Reload" in my Brim app and see the archive Space I just created, and successfully extract a flow from the pcap.

Verify.zip

There's still an open issue #1488 that's necessary to get progress updates working in the app with archive Spaces. @alfred-landrum discusses a branch-based test in https://github.com/brimsec/zq/pull/1450#pullrequestreview-508419232 that validates that we're on the right track with the app, modulo that #1488 enhancement. I'll plan to verify the full app experience when that lands.

Thanks @mattnibs!