Closed tdrwenski closed 1 year ago
I wonder if
cdms3:myBucket
should also be considered a directory?
That's a good question. I had thought of it like this in terms of a mapping to file systems.
cdms3:myBucket#delimiter=/
was essentially saying that the objects in bucket the were laid out such that they could be interpreted as filesystem paths, and the bucket was considered to be the filesystem root "/". Performing a list-objects-v2 request against the bucket with a delimiter of "/" would then be like doing ls /
. In the json return from AWS, "Contents": [...]
would then, essentially, contain a list of the "files" directly under "/", while "CommonPrefixes": [..]
would contain a list of directories directly under "/". So in that sense, everything in the bucket is accessible as if it were filesystem paths without too many surprises. We can then browse the bucket using a combination of delimiter="/" and prefix (selected from the common prefixes at a given "level" of the bucket).
Now let's say the objects in bucket the are not laid out such that they could be interpreted as filesystem paths, but you'd still like to treat the bucket as if it was a filesystem (so you could do things like a "scan" operation of some sort...maybe a datasetScan?). In this case there would be one top level directory with, potentially, a ton of "files", and no subdirectories. The json return from AWS would have "Contents": [...]
and that's it. The root directory would exist and would map to the cdms3:myBucket
, so maybe that should be included as a directory "existing"?
Looking further down the road a bit, treating the bucket as a directory that exists would enable a user to trigger filesystem like operations that would bring their system to a scratching halt, since listing objects in a bucket without a delimiter and/or prefix can become very expensive (time-wise). For example, using the aws cli, compare aws s3api list-objects-v2 --region us-east-1 --bucket noaa-goes18 --delimiter="/"
with aws s3api list-objects-v2 --region us-east-1 --bucket noaa-goes18
. Just something to think about.
That makes sense-- way too many objects would be listed if a bucket was considered a directory without a delimiter. Thanks @lesserwhirls :)
Description of Changes
Part of the fixes needed for S3 datasetScans.
The current
MFileS3::exists
only works for Objects, and uses aheadObject
request. This PR would extend this to work for buckets (need to do aheadBucket
) and "directories" ( by doing alistObjects
and checking it has objects).Note that "Directories" are defined in our code as URIs that have a delimiter fragment and also end with that delimiter unless it's a bucket (e.g.
cdms3:myBucket#delimiter=/
orcdms3:myBucket?myKey/#delimiter=/
).I wonder if
cdms3:myBucket
should also be considered a directory?PR Checklist