NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

Bucket Copy REST API Docs #104

Closed mike-gee closed 2 years ago

mike-gee commented 2 years ago

Noticed that the bucket API docs might be a bit outdated. Managed to work out command by looking at Python SDK source.

https://github.com/NVIDIA/aistore/blob/68689e92bd0f12a42b6b8cbb218e276e129da385/docs/http_api.md?plain=1#L207

I think it should look something like

POST {"action": "copy-bck"} /v1/buckets/<bck> | `curl -i -X POST -H 'Content-Type: application/json' -d '{"action": "copy-bck"}}}' 'http://G/v1/buckets/from-name?provider=<from_provider>&bck_to=<to-provider>//<to-bck>/'`

Happy to put in PR later if preferred.

Also, is there a way to pass in --list parameter to REST API like in CLI? An &list= param didn't seem to do the trick.

https://github.com/NVIDIA/aistore/blob/68689e92bd0f12a42b6b8cbb218e276e129da385/docs/cli/bucket.md?plain=1#L445

gaikwadabhishek commented 2 years ago

Hi @mike-gee, you are almost there. The correct URL should be - http://localhost:8080/v1/buckets/<from-bck>?provider=<from-provider>&bck_to=<to-provider>/@#/<to-bucket>/

I am not sure what you are trying to do with the list here. Does this list contain objects you want to copy? We do have a prefix for sure.

Yes, please go ahead and raise a PR. Thank you so much for finding this out. Our API docs page is quite old we could really use some help on it.

Thanks in advance!

VladimirMarkelov commented 2 years ago

Also, is there a way to pass in --list parameter to REST API like in CLI? An &list= param didn't seem to do the trick.

Copying a list of objects is a multi-object action. It done by sending POST request to v1/buckets/<from-bucket-name> with a JSON body that describes bucket to and what to copy(see struct TCObjsMsg in aistore/cmn/api_multiobj.go)

Edit: sorry, TCObjsMsg is a part of the body request. The whole request is {"action": "copy-listrange", "value": <TCObjsMsg content>}

mike-gee commented 2 years ago

Thank you both for your support. I really appreciate it!

However, it seems using buckets that do not have an ais backend seem to create idle jobs and block future requests using that bucket due to duplicate trnames. I presume this is an issue connecting to the bucket on my end, although interacting with these buckets with ais get seems to work fine.

To reproduce:

curl -i -X POST -H 'Content-Type: application/json' -d @my_json.json 'http://G/v1/buckets/my-gcs-bucket?provider=gcs'`

my_json.json

{"action": "copy-listrange", "value": {"tobck": {"name": "my-ais-bucket", "provider": "ais"}}, "objnames": ["folder-in-gcs-bucket/shard00000000.tar"]}

Produces:

[
    {
        "DaemonID": "xxxxxx",
        "XactSnaps": [
            {
                "id": "yyyyyy",
                "kind": "copy-listrange",
                "bck": {
                    "name": "my-gcs-bucket",
                    "provider": "gcp",
                    "namespace": {
                        "uuid": "",
                        "name": ""
                    }
                },
                "start-time": "x",
                "end-time": "xx",
                "stats": {
                    "loc-objs": "0",
                    "loc-bytes": "0",
                    "out-objs": "0",
                    "out-bytes": "0",
                    "in-objs": "0",
                    "in-bytes": "0"
                },
                "aborted": false,
                "ext": {
                    "is_idle": true
                }
            }
        ]
    }
]
alex-aizman commented 2 years ago

Not reproducing the "create idle jobs and block future requests" scenario:

$ ais bucket cp s3://my-s3-bucket gs://my-gs-bucket --list f1,f2,f3
To check the status, run: ais show job xaction copy-bck gcp://my-gs-bucket

$ ais show job xaction copy-listrange
NODE             ID              KIND            BUCKET  OBJECTS         BYTES           START           END     STATE
mjNt8081         N5U2E5znj_d     copy-listrange  my-s3-bucket  3         11.02KiB        07-21 14:42:07  -       Idle

$ ais show job xaction copy-listrange
NODE             ID              KIND            BUCKET  OBJECTS         BYTES           START           END     STATE
mjNt8081         N5U2E5znj_d     copy-listrange  my-s3-bucket  3         11.02KiB        07-21 14:42:07  -       Idle

$ ais show job xaction copy-listrange
NODE     ID      KIND    BUCKET  OBJECTS         BYTES   START   END     STATE
# date
Thu Jul 21 14:44:37 EDT 2022

(Took it a couple minutes to disappear)
alex-aizman commented 2 years ago

As far as REST API document versus the latest correct implementation, the simplest way to see what's going on is to print it. Like, for instance:

@@ -1140,20 +1141,21 @@ func (p *proxy) hpostBucket(w http.ResponseWriter, r *http.Request, msg *apc.Act
        case apc.ActCopyObjects, apc.ActETLObjects:
                var (
                        xactID string
                        tcoMsg = &cmn.TCObjsMsg{}
                        bckTo  *cluster.Bck
                )
                if err = cos.MorphMarshal(msg.Value, tcoMsg); err != nil {
                        p.writeErrf(w, r, cmn.FmtErrMorphUnmarshal, p.si, msg.Action, msg.Value, err)
                        return
                }
+               glog.Errorln(msg)

I'd first looked up the API constant apc.ActCopyObjects in the api/multiobj.go.

This (above) will print something like:

E 14:42:07.385100 proxy.go:1151 amsg[copy-listrange, val={"objnames":["f1", "f2", "f3"], "template":"", "ext":null, "prefix":"", "dry_run":false, "force":false, "tobck":{"provider":"gcp", "namespace":{"uuid":"", "name":""}, "name":"my-gs-bucket"}, "coer":false}]