arunaengine / proxy

DEPRECATED: See https://github.com/ArunaStorage/aruna/tree/main/components/data_proxy for latest release
Apache License 2.0
5 stars 0 forks source link

Issues with ListObjectsV2 #55

Closed mzur closed 11 months ago

mzur commented 11 months ago

I see a different behavior when interacting with the ListObjectsV2 endpoint of AOS than a regular S3 service. Usually, I call ListObjectsV2 to show the "directories" and files with a certain prefix, while having / as delimiter, so only the objects at the current "directory level" are shown. With a regular S3 endpoint, I get the current "directories" as CommonPrefixes and the curent objects as Contents. With AOS there are two issues:

So to fix this, I would expect the ListObjectsV2 endpoint to only return objects as Contents and also provide CommonPrefixes terminated with a slash.

Here is an example query with the PHP S3 SDK:

$options = ['Bucket' => 'biigletest', 'Prefix' => '', 'Delimiter' => '/'];
$paginator = $client->getPaginator('ListObjectsV2', $options);
$result = $paginator->current();
$result->get('Contents');
// [
//   [
//     "Key" => "biigletest",
//     "LastModified" => Aws\Api\DateTimeResult @1701703435 {#8474
//       date: 2023-12-04 15:23:55.0 +00:00,
//     },
//     "ETag" => "01HGTPVH6VGAC4ADJQEWHRMP50",
//     "Size" => "0",
//   ],
// ]
$result->get('CommonPrefixes');
// [
//   [
//     "Prefix" => "biigletest",
//   ],
// ]
St4NNi commented 11 months ago

Thanks for reporting, yes this is a bug. Regarding the trailing slashes this should be an easy fix, regarding the "duplication" of hierarchy resources I am actually not sure how we should handle this. Without this duplication it would hard or almost impossible to query the ID of a hierarchy resource.

This ID can be used to download a .tar.gz bundle of the whole downwards resource tree via the objects special bucket.

mzur commented 11 months ago

This ID can be used to download a .tar.gz bundle of the whole downwards resource tree via the objects special bucket.

So this can be done with S3? Maybe it's also fine to only offer this feature via the regular API to keep the S3 API compliant with the specs. Or only show this behavior with the objects bucket.

St4NNi commented 11 months ago

Yes, sub-trees can be bundled into .tar.gz archives. But thinking about it i agree that it is not the best user-experience to repeat these entries in ListObjectV2. So I will put this on the todo list for future updates.

Unfortunately the spec doesn't really help here because hierarchy objects have a different meaning for us which goes way farther then the specs perspective of bucket and keys that are only strings with some arbitrary separators.

mzur commented 11 months ago

Ok, so from my perspective it would be ok if I had to speak with the regular API if I want to download an archive of a sub-tree. With S3, I can easily download a sub-tree in the usual way (maybe even faster because the load can be distributed). But that's only my opinion.

So right now if I want to support AOSv2 in my BIIGLE service, I still have to handle object listing in a special way via the regular API (just as with AOSv1 without ListObjectV2). So I'd very much appreciate if this would be changed/fixed :slightly_smiling_face:

das-Abroxas commented 11 months ago

We have fixed the ListObjectsV2 implementation (2719f53db0b66cd888218e98dd8b9475f0768507), which should already be available in the dev instance.

This means that a correct distinction is now made between Contents and CommonPrefixes, depending on the delimiter. The CommonPrefixes also now end correctly with a slash (or the delimiter specified in the request). The only "duplicates" that should now still appear in a response are Objects that can be accessed via multiple hierarchies in a Project, i.e. that exist in multiple Collections and/or Datasets.

It would be great if you could test the functionality again and give us feedback.

Have a nice weekend :v:

mzur commented 11 months ago

Works perfectly now, thanks!

The next thing I would like to do is to set CORS rules for the bucket. s3cmd tells me that PutBucketCors is not implemented in the data proxy. It's also not mentioned here https://github.com/ArunaStorage/DataProxy/issues/19#issue-1499909705. Maybe you could put that on your roadmap, too (unless there is a way via the web UI or regular API)?

St4NNi commented 11 months ago

Nice, your welcome !

We had a CORS implementation in V1 and need to port this to V2, I have updated the corresponding issue #29 . Will close this issue for now, if you have any problems regarding ListObjectV2 feel free to re-open it any time.