NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.27k stars 173 forks source link

S3 compatibility with AWS bucket backend #68

Closed Timshel closed 3 years ago

Timshel commented 3 years ago

Hi,

Even if for now I'll pause my exploration of aistore, wanted to give some feedback on the issues I encountered when testing (using https://github.com/NVIDIA/aistore/commit/9ec804e2a15992d2813ca07268d48fb88cca44d0).

After activating the aws cloud backend, I started first with the direct access. I was able to list the bucket content (after getting trolled a bit since I was too restrictive on my bucket policy and needed to add more than just List* for aistore to read bucket metadata I believe).

ais ls s3://aistore-test --props all
NAME         SIZE        CHECKSUM                ATIME           VERSION     CACHED  TARGET URL      STATUS  COPIES
Chart-88.png     44.50KiB    a8c44bbfa1d649872a05a3ef1cea7bc6                        no  http://10.10.1.21:5354  ok  0
Chart-89.png     44.50KiB    ee901cdd0bea09c2            08 Dec 20 09:53 UTC             yes     http://10.10.1.21:5354  ok  1

Two things that were worrisome are the missing ATIME when the file is not cached and the different checksum depending if the file is cached or not (it's the same file with a different name). But what I did not realized was that the bucket is not accessible through the s3 endpoint, when listing with s3cmd the bucket is not visible and if I try to list it it will fail with a 404 on ais://aistore-test.

So I switched to creating an ais bucket with an aws backend :

ais create bucket test
ais set props ais://test backend_bck=aws://aistore-test
ais set props ais://test checksum.type=md5

First issue I had was with default max page size, the target was outputting errors such as : page size exceeds the maximum value (got: 10000, max expected: 1000). I believed it's due to the default for ais being higher than the one for aws. There is probably a clean way to do it but I just changed it here https://github.com/NVIDIA/aistore/blob/master/cmn/api_const.go#L280 :).

After this I encountered the same missing ATIME when the file is not cached which prevented s3cmd from listing the bucket. After loading each file in cache the ls returned but all files were listed twice.

When trying to push a file using with s3cmd it failed after multiple retry due to invalid checksum (file is still uploaded), when setting the checksum.type props it appears to have no effect on the file checksum (tried setting it before and after setting backend_bck, did not check with tcpdump if the ETag was present).

When fetching a file with s3cmd if it's not cached then it will fail first due to missing ATIME, then after it's loaded in cache it will download the file but output and error due to an invalid checksum.

Edit: Additionally I did two upload test on a standard ais bucket.

Thank you and have a nice day.

VladimirMarkelov commented 3 years ago

Thank you for the valuable feedback! New examples helped me to catch a bug you'd mentioned in the previous issue (about empty time) as well. I reproduced all the issues - let me go through them one by one (master branch at commit a3b35da08 includes all the fiixes):

This behavior is expected - our S3-compatibility layer provides access only to AIS buckets that may (or may not) be configured with a Cloud backend.

AIStore currently does not track object creation times. We track (or rather, cache) access times, and only for the purposes of optimizing storage capacity when running out of space. In object-list responses, AIS returns the last access time.

That is why never-accessed objects have an empty LastModification time. To support clients like s3cmd AIS will now return a zero Unix time indicating that the object exists is in the Cloud but AIS has never downloaded it (please, see an example in the updated documentation.

Good catch, thank you! The bug, already fixed in the master, was related to having AIS bucket configured with a backend Cloud bucket.

Here again, an AIS bucket with a Cloud backend went a different path returning MD5 checksum in the response body and leaving the response header empty. Fixed in the master and must work fine when ais set props ais://test checksum.type=md5 is set for the bucket.