brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

Reject S3 URLs as ZAR_ROOT #1030

Closed philrz closed 4 years ago

philrz commented 4 years ago

When verifying #881, I mistakenly thought I was supposed to set my $ZAR_ROOT to an actual S3 URL, but this is incorrect. I was supposed to still use a $ZAR_ROOT on the local filesystem and specify the S3 URL via -data.

When I did it the wrong way, it actually accepted this and populated the bucket, but put the metadata in a local path starting with a directory s3:. It basically still works when I run it on my Mac, but this is not how it's intended, and pathnames with colons in them are invalid on Windows, so this feels like a bug. Repro is with zq commit 59b4bcc.

$ export ZAR_ROOT="s3://zq-881"

$ ls -l
[no output... current local directory starts out empty]

$  aws s3 ls s3://zq-881
[no output... S3 bucket starts out empty]

$ zq ~/work/zq-sample-data/zng/*.gz | zar import -s 25MB -

$ tree -s
.
└── [         96]  s3:
    └── [         96]  zq-881
        └── [        995]  zar.json

2 directories, 1 file

$ aws s3 ls --recursive s3://zq-881
2020-07-20 19:42:52   23094074 20180324/1521911841.543641.zng
2020-07-20 19:42:45   25476221 20180324/1521911975.777469.zng
2020-07-20 19:42:40   25483926 20180324/1521912152.518493.zng
2020-07-20 19:42:35   25453122 20180324/1521912335.72784.zng
2020-07-20 19:42:30   25499352 20180324/1521912549.366398.zng
2020-07-20 19:42:26   25478311 20180324/1521912792.328806.zng
2020-07-20 19:42:17   25483642 20180324/1521912990.158766.zng

$ zar zq "count()" _ | zq -f text "sum(count)" -
1462078

Per a team discussion today, it sounds like the fix would be to reject the S3 URL in $ZAR_ROOT with a helpful error message.

philrz commented 4 years ago

In a discussion with @alfred-landrum and @mattnibs it was decided that we're going to add support for S3 URL in $ZAR_ROOT, so closing this one as "invalid".