Open jwodder opened 6 hours ago
For the record, I found the following sources on S3 URLs, but neither addresses this:
do you mean on how path
component (corresponding to {key}/
) in an s3://
URL should be treated when mapping to the {key}
location on S3 and subsequently on filesystem? My guess, since it is a URL, it should be urllib.parse.unquote
-ed.
BTW, since the tool to be used on a cron job, would it make sense to allow for config file, like you have made for https://github.com/con/tinuous? then input could be just separate bucket_name
and optional bucket_prefix
(since there is no keys for folders on S3 really) and thus not messing with S3 urls altogether?
BTW, here is the url I found earlier today on legit keys syntax (was looking if newline is allowed): https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
do you mean on how
path
component (corresponding to{key}/
) in ans3://
URL should be treated when mapping to the{key}
location on S3 and subsequently on filesystem? My guess, since it is a URL, it should beurllib.parse.unquote
-ed.
Yes, that's what I mean, and it's a reasonable guess, but can you confirm that's how it works? Try creating a test file with a space in its name on S3 and see (a) how aws s3 ls
displays it and (b) how aws s3
requires you to specify the file's S3 URL as an argument.
BTW, since the tool to be used on a cron job, would it make sense to allow for config file, like you have made for https://github.com/con/tinuous? then input could be just separate
bucket_name
and optionalbucket_prefix
(since there is no keys for folders on S3 really) and thus not messing with S3 urls altogether?
That seems unnecessary.
Why do we care how aws
CLI works -- or are we to interface it?
(note: there is no "s3" protocol, and AFAIK all implementations of handling s3:// URLs are pretty much ad-hoc, so I would not be surprised if they vary in their approaches)
I used this to produce even more obscure name and upload itself into S3:
#!/bin/bash
#
# Script to produce S3 key with obscure name
#
aws s3api put-object --bucket dandiarchive-embargo \
--key '!#$%&'"'"'()*+,-./:;<=>?@[\\]^_\`{|}~azAZ09␣ unicode测试\r' \
--body "$0"
used dandiarchive-embargo since no longer used afaik etc... Ended up with
dandi@drogon:~/bin$ aws s3 ls 's3://dandiarchive-embargo/!#$%&'"'"'()*+,-./' | head
2024-11-18 17:08:38 199 :;<=>?@[\\]^_\`{|}~azAZ09␣ unicode测试\r
andi@drogon:~/bin$ aws s3 ls 's3://dandiarchive-embargo/!#$%&'"'"'()*+,-./:;<=>?@[\\]^_\`{|}~azAZ09␣ unicode 测试\r' | head
2024-11-18 17:08:38 199 :;<=>?@[\\]^_\`{|}~azAZ09␣ unicode测试\r
so taking characters as is, and not carrying to url unquote anything since quoted paths do not work
In [1]: import urllib
In [2]: urllib.parse.quote("!#$%&'()*+,-./")
Out[2]: '%21%23%24%25%26%27%28%29%2A%2B%2C-./'
dandi@drogon:~/bin$ aws s3 ls "s3://dandiarchive-embargo/%21%23%24%25%26%27%28%29%2A%2B%2C-./"
dandi@drogon:~/bin$ aws s3 ls "s3://dandiarchive-embargo/%21"
s3invsync
takes anaws
-styles3://{bucket}/{key}/
URL for the inventory base on the command line, and many of the error messages include S3 URLs as well. However, how should special characters like whitespace be handled in such URLs? Do they need to be percent-encoded?@yarikoptic Please look into how the
aws
command handles S3 URLs with keys with special characters and report back.