brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

zar type and field indexes should use count reducer #916

Closed alfred-landrum closed 4 years ago

alfred-landrum commented 4 years ago

The type and field indexes should be created with the equivalent of count() by key instead of just by key as they currently are.

henridf commented 4 years ago

The motivation for this change is to support a front-end feature of display the number of hits per key when displaying index search results.

philrz commented 4 years ago

Verified in zq commit 26d4b66.

Following the steps in the zar README, at the steps where I view the contents of the created "type" & "field" indexes, I can now see these count values for each key.

$ zq -t $ZAR_ROOT/20180324/1521912152.518493.zng.zar/zdx-type-ip.zng
#0:record[magic:string,version:string,child_field:string,keys:record[key:ip]]
0:[zdx;0.2;_btree_child;-;]
#1:record[key:ip,count:uint64]
1:[2.22.230.64;3;]
1:[5.9.78.71;3;]
1:[5.9.250.164;20;]
1:[5.199.135.170;3;]
1:[8.8.8.8;1;]
1:[8.43.85.67;4;]
1:[10.0.0.1;359;]
...

$ zq -t /Users/phil/logs/20180324/1521912152.518493.zng.zar/zdx-field-uri.zng
#0:record[magic:string,version:string,child_field:string,keys:record[key:bstring]]
0:[zdx;0.2;_btree_child;-;]
#1:record[key:bstring,count:uint64]
1:[%.;3;]
1:[*;2;]
1:[././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././../../../../../../../../;1;]
1:[/;800;]
1:[/+CSCOE+/logon.html;1;]
1:[/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/etc/passwd;1;]
1:[/.%2e/.%2e/.%2e/.%2e/windows/win.ini;1;]
...

Thanks @henridf!