NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

Ability to get last version from backend storage #114

Closed and-1 closed 1 year ago

and-1 commented 1 year ago

I want use ais as cache tier with another ais cluster as provider. How can i validate that object in cache is not obsolete? What about extend get object api and add object version as query param. If that version not found in cache then initiate cold get from backend?

alex-aizman commented 1 year ago
$ ais bucket props set s3://<BUCKET-NAME> versioning.validate_warm_get=true

Example:

  1. Initially, we have the same version stored in both AIS and AWS S3:
    
    $ ais ls s3://ais-aa --cached --bytes
    NAME     SIZE
    file-x   11319

$ s3cmd ls s3://ais-aa 2022-11-29 14:47 11319 s3://ais-aa/file-x


2. Let's overwrite `file-x` with `s3cmd` configured to use standard AWS S3 endpoint (and _not_ AIS S3 endpoint):
```console
$ echo abc > /tmp/abc
$ s3cmd put /tmp/abc s3://ais-aa/file-x
$ s3cmd ls s3://ais-aa
2022-11-29 15:01            4  s3://ais-aa/file-x

# AIS still shows the original size:
$ ais ls s3://ais-aa --cached --bytes
NAME     SIZE
file-x   11319
  1. Set bucket property to validate versioning:

    $ ais bucket props set s3://ais-aa versioning.validate_warm_get=true
    "versioning.validate_warm_get" set to: "true" (was: "false")
  2. This (above) forces AIS to do an extra step of checking (and trading off GET performance):

    
    $ ais get s3://ais-aa/file-x /tmp/file-x
    GET "file-x" from s3://ais-aa as "/tmp/file-x" [4B]
    $ ll /tmp/file-x
    -rw-r--r-- 1 root root 4 Nov 29 10:15 /tmp/file-x

$ ais ls s3://ais-aa --cached --bytes NAME SIZE file-x 4

and-1 commented 1 year ago

versioning.validate_warm_get - is too expensive from point of view latency (my two clusters located in different regions). Only 5-10% objects may be changed and objects quite small (50-150kb). Adding version/hash as query param to get request is more granular way to invalidate cached object.

alex-aizman commented 1 year ago

Update: api.GetObject now returns object attributes, as it should:

// Returns `ObjAttrs` that can be further used to get the size and other object metadata.
func GetObject(bp BaseParams, bck cmn.Bck, object string, args *GetArgs) (oah ObjAttrs, err error) {