Closed philrz closed 4 years ago
I think the CSV writer should handle heterogenous records, though I assume that'd require writing to a temporary file to build the union of all the record types. That may be more than what the community user asked for, if their target was a sql db import.
@alfred-landrum, agreed. I will write up a new issue for this. There are a couple ways to do it.
We also need to change the search endpoint with csv output (as part of the larger refactoring there) so the front-end doesn't have to have its own csv writer and record unflattener in javascript.
Verified in zq
commit 4bce00d
.
Output as CSV currently works for any data that can be represented by a single "descriptor" (i.e. they all have the same record type). For instance, the contents of a single Zeek log:
$ zq -f csv stats.log.gz
_path,ts,peer,mem,pkts_proc,bytes_recv,pkts_dropped,pkts_link,pkt_lag,events_proc,events_queued,active_tcp_conns,active_udp_conns,active_icmp_conns,tcp_conns,udp_conns,icmp_conns,timers,active_timers,files,active_files,dns_requests,active_dns_requests,reassem_tcp_size,reassem_file_size,reassem_frag_size,reassem_unknown_size
stats,2018-03-24T17:15:20.600725Z,zeek,74,26,29375,-,-,-,404,11,1,0,0,1,0,0,36,32,0,0,0,0,1528,0,0,0
stats,2018-03-24T17:20:20.60102Z,zeek,281,6435390,3052165142,-,-,-,2575313,2575317,5818,187,1070,338620,5126,9490,1301330,33086,48528,27,61,0,77448,53168,0,0
stats,2018-03-24T17:25:20.601054Z,zeek,282,4556854,1920350471,-,-,-,1958516,1958507,4401,110,1010,211123,4600,2354,1006374,27351,41930,10,5,0,90752,0,1672,0
stats,2018-03-24T17:30:20.601101Z,zeek,281,8112284,3883978461,-,-,-,1586683,1586685,2795,137,648,207754,4666,31554,928834,19429,29547,26,3,0,126712,395584,1504,0
stats,2018-03-24T17:35:20.601137Z,zeek,282,5467567,3398705931,-,-,-,1535999,1535998,4239,146,305,193639,4731,2510,879701,25895,35230,88,6,0,455128,0,0,0
It's then trivial to put this output into my paste buffer and, say, enter it into a Google Sheet and select Data > Split Text To Columns.
If the data trying to be output requires multiple descriptors, the user will receive an error message and the output will stop when the first record of a different type is encountered. For example, if you try to output two Zeek logs at once via CSV:
$ zq -f csv stats.log.gz weird.log.gz
_path,ts,peer,mem,pkts_proc,bytes_recv,pkts_dropped,pkts_link,pkt_lag,events_proc,events_queued,active_tcp_conns,active_udp_conns,active_icmp_conns,tcp_conns,udp_conns,icmp_conns,timers,active_timers,files,active_files,dns_requests,active_dns_requests,reassem_tcp_size,reassem_file_size,reassem_frag_size,reassem_unknown_size
stats,2018-03-24T17:15:20.600725Z,zeek,74,26,29375,-,-,-,404,11,1,0,0,1,0,0,36,32,0,0,0,0,1528,0,0,0
csv output requires uniform records but different types encountered
However, we do have a plan to address this case as well, and that's tracked in #1271.
Finally, I noticed that csv
is not yet listed in the -f
output formats in the zq help
text, so #1300 tracks getting that added.
Thanks @mccanne!
A community user asked: