If I make a typo when accessing them, I get a wordy and confusing error dump that doesn't come across as the S3 equivalent of file-not-found.
$ zq -t 'count() by _path' 's3://zq-771/foo/does-not-exist'
s3://zq-771/foo/does-not-exist: format detection error
tzng: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F1C78441B5D20A27, host id: hLuZ/B+aHnaApXq4wLjpsVUQ/t49brWWjM4VhA8HNg7f4Cvpq99D9qwvDj660EMrE8xPQOW6BiA=
zeek: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F1C78441B5D20A27, host id: hLuZ/B+aHnaApXq4wLjpsVUQ/t49brWWjM4VhA8HNg7f4Cvpq99D9qwvDj660EMrE8xPQOW6BiA=
ndjson: NoSuchKey: The specified key does not exist.
status code: 404, request id: F1C78441B5D20A27, host id: hLuZ/B+aHnaApXq4wLjpsVUQ/t49brWWjM4VhA8HNg7f4Cvpq99D9qwvDj660EMrE8xPQOW6BiA=
zjson: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F1C78441B5D20A27, host id: hLuZ/B+aHnaApXq4wLjpsVUQ/t49brWWjM4VhA8HNg7f4Cvpq99D9qwvDj660EMrE8xPQOW6BiA=
zng: NoSuchKey: The specified key does not exist.
status code: 404, request id: F1C78441B5D20A27, host id: hLuZ/B+aHnaApXq4wLjpsVUQ/t49brWWjM4VhA8HNg7f4Cvpq99D9qwvDj660EMrE8xPQOW6BiA=
parquet: auto-detection not supported
Granted, a user intimately familiar with S3 might pick up on the references to "specified key" as communicating this, but leading with "format detection error" and emitting an auto-detect error for every format distracts from this.
By comparison, the error for the equivalent at the filesystem is quite friendly:
$ zq -t 'count() by _path' does-not-exist.tzng
does-not-exist.tzng: stat does-not-exist.tzng: no such file or directory
Related: I actually stumbled onto this because I was trying to see if wildcards would work. They don't, producing the same error:
$ zq -t 'count() by _path' 's3://zq-771/foo/*'
s3://zq-771/foo/*: format detection error
tzng: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F2D2D4CAE9A6F8DB, host id: luMQzixDYKKiFN7yA7RQ0q4Q9ur7GgsmQYj/LOzUY98ldD6YnrFk+crB+oBeMYyjFfxLpvfLB5U=
zeek: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F2D2D4CAE9A6F8DB, host id: luMQzixDYKKiFN7yA7RQ0q4Q9ur7GgsmQYj/LOzUY98ldD6YnrFk+crB+oBeMYyjFfxLpvfLB5U=
ndjson: NoSuchKey: The specified key does not exist.
status code: 404, request id: F2D2D4CAE9A6F8DB, host id: luMQzixDYKKiFN7yA7RQ0q4Q9ur7GgsmQYj/LOzUY98ldD6YnrFk+crB+oBeMYyjFfxLpvfLB5U=
zjson: line 1: NoSuchKey: The specified key does not exist.
status code: 404, request id: F2D2D4CAE9A6F8DB, host id: luMQzixDYKKiFN7yA7RQ0q4Q9ur7GgsmQYj/LOzUY98ldD6YnrFk+crB+oBeMYyjFfxLpvfLB5U=
zng: NoSuchKey: The specified key does not exist.
status code: 404, request id: F2D2D4CAE9A6F8DB, host id: luMQzixDYKKiFN7yA7RQ0q4Q9ur7GgsmQYj/LOzUY98ldD6YnrFk+crB+oBeMYyjFfxLpvfLB5U=
parquet: auto-detection not supported
Even once we address the main issue, I expect we'll still have a problem here. Upon closer inspection, this is not "our bug" per se: Docs like this point out that the asterisk character is a valid character, and this is apparently used to justify not supporting wildcard options in AWS CLI tools, such as discussed in https://github.com/aws/aws-cli/issues/3784. I'm suspecting that users are more likely to try using asterisks as wildcards as opposed to creating S3 objects with asterisks in the names. Therefore, at minimum, we should probably catch the uses of asterisks and, assuming no exact match is found in the bucket, produce an error message that specifically explains that wildcards aren't going to work. Or... we could implement some kind of equivalent by first downloading the list of objects and doing the wildcard match client-side before downloading the objects individually. I'll gladly open up a separate issue on this whole wildcard topic if folks would rather deal with that separately. I just wanted to capture the info here because it's actually what I first intended this issue to be about before I realized the error message was confusing in the general case.
Repro is with
zq
commit821ae98
.Suppose I've got a directory on an S3 bucket with valid files to be read.
If I make a typo when accessing them, I get a wordy and confusing error dump that doesn't come across as the S3 equivalent of file-not-found.
Granted, a user intimately familiar with S3 might pick up on the references to "specified key" as communicating this, but leading with "format detection error" and emitting an auto-detect error for every format distracts from this.
By comparison, the error for the equivalent at the filesystem is quite friendly:
Related: I actually stumbled onto this because I was trying to see if wildcards would work. They don't, producing the same error:
Even once we address the main issue, I expect we'll still have a problem here. Upon closer inspection, this is not "our bug" per se: Docs like this point out that the asterisk character is a valid character, and this is apparently used to justify not supporting wildcard options in AWS CLI tools, such as discussed in https://github.com/aws/aws-cli/issues/3784. I'm suspecting that users are more likely to try using asterisks as wildcards as opposed to creating S3 objects with asterisks in the names. Therefore, at minimum, we should probably catch the uses of asterisks and, assuming no exact match is found in the bucket, produce an error message that specifically explains that wildcards aren't going to work. Or... we could implement some kind of equivalent by first downloading the list of objects and doing the wildcard match client-side before downloading the objects individually. I'll gladly open up a separate issue on this whole wildcard topic if folks would rather deal with that separately. I just wanted to capture the info here because it's actually what I first intended this issue to be about before I realized the error message was confusing in the general case.