brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

s3 support for zar find/index/ls/rm/rmdirs/stat #972

Closed mattnibs closed 4 years ago

philrz commented 4 years ago

Verified in zq commit 59b4bcc with some bugs found/opened.

Here's the "happy path" of zar README steps performed with the storage landing on an S3 bucket.

$ export ZAR_ROOT="$(pwd)"

$ zq ~/work/zq-sample-data/zng/*.gz | zar import -data s3://zq-972 -s 25MB -

$ zar index :ip
s3://zq-972/20180324/1521912990.158766.zng: creating index s3://zq-972/20180324/1521912990.158766.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521912792.328806.zng: creating index s3://zq-972/20180324/1521912792.328806.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521912549.366398.zng: creating index s3://zq-972/20180324/1521912549.366398.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521912335.72784.zng: creating index s3://zq-972/20180324/1521912335.72784.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521912152.518493.zng: creating index s3://zq-972/20180324/1521912152.518493.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521911975.777469.zng: creating index s3://zq-972/20180324/1521911975.777469.zng.zar/zdx-type-ip
s3://zq-972/20180324/1521911841.543641.zng: creating index s3://zq-972/20180324/1521911841.543641.zng.zar/zdx-type-ip

$ zar index uri
s3://zq-972/20180324/1521912990.158766.zng: creating index s3://zq-972/20180324/1521912990.158766.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521912792.328806.zng: creating index s3://zq-972/20180324/1521912792.328806.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521912549.366398.zng: creating index s3://zq-972/20180324/1521912549.366398.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521912335.72784.zng: creating index s3://zq-972/20180324/1521912335.72784.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521912152.518493.zng: creating index s3://zq-972/20180324/1521912152.518493.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521911975.777469.zng: creating index s3://zq-972/20180324/1521911975.777469.zng.zar/zdx-field-uri
s3://zq-972/20180324/1521911841.543641.zng: creating index s3://zq-972/20180324/1521911841.543641.zng.zar/zdx-field-uri

$ zar index -q -o custom -k id.orig_h -z "count() by _path, id.orig_h | sort id.orig_h"

$ zar ls
s3://zq-972/20180324/1521912990.158766.zng.zar
s3://zq-972/20180324/1521912792.328806.zng.zar
s3://zq-972/20180324/1521912549.366398.zng.zar
s3://zq-972/20180324/1521912335.72784.zng.zar
s3://zq-972/20180324/1521912152.518493.zng.zar
s3://zq-972/20180324/1521911975.777469.zng.zar
s3://zq-972/20180324/1521911841.543641.zng.zar

$ aws s3 ls --recursive s3://zq-972
2020-07-21 08:19:59   23094074 20180324/1521911841.543641.zng
2020-07-21 08:22:05       7598 20180324/1521911841.543641.zng.zar/custom.zng
2020-07-21 08:21:30        160 20180324/1521911841.543641.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:30     135070 20180324/1521911841.543641.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:49      32417 20180324/1521911841.543641.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:55   25476221 20180324/1521911975.777469.zng
2020-07-21 08:22:02       6211 20180324/1521911975.777469.zng.zar/custom.zng
2020-07-21 08:21:25        122 20180324/1521911975.777469.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:25      70491 20180324/1521911975.777469.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:46      10926 20180324/1521911975.777469.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:47   25483926 20180324/1521912152.518493.zng
2020-07-21 08:21:58       6178 20180324/1521912152.518493.zng.zar/custom.zng
2020-07-21 08:21:18      56967 20180324/1521912152.518493.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:42       6458 20180324/1521912152.518493.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:40   25453122 20180324/1521912335.72784.zng
2020-07-21 08:21:54       6499 20180324/1521912335.72784.zng.zar/custom.zng
2020-07-21 08:21:13         80 20180324/1521912335.72784.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:13     118533 20180324/1521912335.72784.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:38      16843 20180324/1521912335.72784.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:35   25499352 20180324/1521912549.366398.zng
2020-07-21 08:21:50      10319 20180324/1521912549.366398.zng.zar/custom.zng
2020-07-21 08:21:09      41855 20180324/1521912549.366398.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:34         40 20180324/1521912549.366398.zng.zar/zdx-type-ip.1.zng
2020-07-21 08:20:34      79014 20180324/1521912549.366398.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:26   25478311 20180324/1521912792.328806.zng
2020-07-21 08:21:45       5947 20180324/1521912792.328806.zng.zar/custom.zng
2020-07-21 08:21:06      52462 20180324/1521912792.328806.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:30      10827 20180324/1521912792.328806.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:19   25483642 20180324/1521912990.158766.zng
2020-07-21 08:21:42       4731 20180324/1521912990.158766.zng.zar/custom.zng
2020-07-21 08:21:02      29842 20180324/1521912990.158766.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:26      22936 20180324/1521912990.158766.zng.zar/zdx-type-ip.zng

$ zar find :ip=10.10.23.2
s3://zq-972/20180324/1521911841.543641.zng

$ zar find uri=/file
s3://zq-972/20180324/1521912335.72784.zng
s3://zq-972/20180324/1521911841.543641.zng

$ zar find -z -x custom 10.164.94.120 | zq -f table "count=sum(count) by _path" -
_PATH       COUNT
ntlm        80
dpd         24
dns         8
conn        26726
rdp         4116
rfb         3
notice      35
ftp         93
ssl         9538
http        13485
smtp        1178
smb_mapping 65
dce_rpc     2
smb_files   1
weird       316
ssh         1

$ zar rm custom.zng
s3://zq-972/20180324/1521912990.158766.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521912792.328806.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521912549.366398.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521912335.72784.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521912152.518493.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521911975.777469.zng.zar/custom.zng: removed
s3://zq-972/20180324/1521911841.543641.zng.zar/custom.zng: removed

$ aws s3 ls --recursive s3://zq-972
2020-07-21 08:19:59   23094074 20180324/1521911841.543641.zng
2020-07-21 08:21:30        160 20180324/1521911841.543641.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:30     135070 20180324/1521911841.543641.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:49      32417 20180324/1521911841.543641.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:55   25476221 20180324/1521911975.777469.zng
2020-07-21 08:21:25        122 20180324/1521911975.777469.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:25      70491 20180324/1521911975.777469.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:46      10926 20180324/1521911975.777469.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:47   25483926 20180324/1521912152.518493.zng
2020-07-21 08:21:18      56967 20180324/1521912152.518493.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:42       6458 20180324/1521912152.518493.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:40   25453122 20180324/1521912335.72784.zng
2020-07-21 08:21:13         80 20180324/1521912335.72784.zng.zar/zdx-field-uri.1.zng
2020-07-21 08:21:13     118533 20180324/1521912335.72784.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:38      16843 20180324/1521912335.72784.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:35   25499352 20180324/1521912549.366398.zng
2020-07-21 08:21:09      41855 20180324/1521912549.366398.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:34         40 20180324/1521912549.366398.zng.zar/zdx-type-ip.1.zng
2020-07-21 08:20:34      79014 20180324/1521912549.366398.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:26   25478311 20180324/1521912792.328806.zng
2020-07-21 08:21:06      52462 20180324/1521912792.328806.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:30      10827 20180324/1521912792.328806.zng.zar/zdx-type-ip.zng
2020-07-21 08:19:19   25483642 20180324/1521912990.158766.zng
2020-07-21 08:21:02      29842 20180324/1521912990.158766.zng.zar/zdx-field-uri.zng
2020-07-21 08:20:26      22936 20180324/1521912990.158766.zng.zar/zdx-type-ip.zng

$ zar stat
TYPE  LOG_ID                         START             DURATION      SIZE
chunk 20180324/1521912990.158766.zng 1521912792.331503 197.827263001 25483642
chunk 20180324/1521912792.328806.zng 1521912549.366782 242.962024001 25478311
chunk 20180324/1521912549.366398.zng 1521912335.728195 213.638203001 25499352
chunk 20180324/1521912335.72784.zng  1521912152.519494 183.208346001 25453122
chunk 20180324/1521912152.518493.zng 1521911975.778000 176.740493001 25483926
chunk 20180324/1521911975.777469.zng 1521911841.543641 134.233828001 25476221
chunk 20180324/1521911841.543641.zng 1521911720.600725 120.942916001 23094074

Issues opened along the way: