brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

support s3 output locations in zq #771

Closed alfred-landrum closed 4 years ago

alfred-landrum commented 4 years ago

A user should be able to specify an s3 location for the ouput location of zq:

zq -o s3://bucket/file.zng "*" 

We should try to make this work for all of our file formats, and discuss if any special handling is needed for formats that require seeking or other features that don't map easily to the s3 api.

philrz commented 4 years ago

Verified in zq commit 952e0fd.

$ aws s3 ls s3://zq-771
[no output]

$ echo '{"foo": "bar"}' | zq -o s3://zq-771/foo.zng -

$ zq -t s3://zq-771/foo.zng
#0:record[foo:string]
0:[bar;]

While testing this and other S3 features, I've noted some hazards related to the AWS SDK for Go that we might want to help users avoid. I've opened https://github.com/brimsec/zq/issues/904 to track that as a separate topic.

Thanks @mattnibs!

philrz commented 4 years ago

With the merge of #898, I've also verified in zq commit 821ae98 that output to an S3 directory is working.

$ aws s3 ls s3://zq-771/foo/
[no output... directory doesn't exist yet]

$ zq -t -d s3://zq-771/foo 'uid=Ckwqsn2ZSiVGtyiFO5' *
$ aws s3 ls s3://zq-771/foo/
2020-06-16 15:08:25        541 conn.tzng
2020-06-16 15:08:25        747 notice.tzng
2020-06-16 15:08:25        727 ssl.tzng

$ zq -t 'count() by _path' s3://zq-771/foo/conn.tzng s3://zq-771/foo/notice.tzng s3://zq-771/foo/ssl.tzng
#0:record[_path:string,count:uint64]
0:[conn;1;]
0:[ssl;1;]
0:[notice;1;]