Improve sorting used during ingest

brimdata / super

A novel data lake based on super-structured data

BSD 3-Clause "New" or "Revised" License

1.39k stars 64 forks source link

Verified in Brim commit e9b0840 talking to zqd commit 48ed30a.

It took a bit of waiting, but I was able to import the wrccdc "year1" data set, which is 12 GB of uncompressed Zeek logs, which was turned into 8.8 GB of uncompressed all.zng. I was no longer blocked by the 10-million record sort limit that used to be in place. Once I was presented with the splash of newest events, I did a count() and confirmed that I saw the same count as when running zq to count the unsorted data set.

~/Downloads/sampledata/wrccdc-year1/zeek-logs$ zq -t "count()" *
#0:record[count:uint64]
0:[96217189;]

Thanks @nwt!

brimdata / super

Improve sorting used during ingest #525