NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.21k stars 160 forks source link

`ais download gs://src ais://dst` requires setting `ais://dst backend_bck=gs://src` #147

Closed yingca1 closed 11 months ago

yingca1 commented 1 year ago
> ais start download "gs://ais_video/dataset1/{00000..20000}.tar" ais://ais_video

Warning: 834 of 20001 download jobs failed. For details, run 'ais show job dnl-SDuNm685R -v'BarFlag)s

> ais show job dnl-SDuNm685R -v
        ...
        10338.tar: nil error w/ bad status
        10378.tar: nil error w/ bad status
        10406.tar: nil error w/ bad status
        10423.tar: nil error w/ bad status
        10443.tar: nil error w/ bad status
        10454.tar: nil error w/ bad status
        10457.tar: nil error w/ bad status
        10478.tar: nil error w/ bad status
        10488.tar: nil error w/ bad status
        10511.tar: nil error w/ bad status
        10540.tar: nil error w/ bad status
        10600.tar: nil error w/ bad status
        10652.tar: nil error w/ bad status
        10676.tar: nil error w/ bad status
        10723.tar: nil error w/ bad status
        10758.tar: nil error w/ bad status
        10777.tar: nil error w/ bad status
        ...

> ais create ais://bucket_1

> ais start download "ais://ais_video/dataset1/00000.tar" ais://bucket_1
Started download job dnl-zIaAhNr25
Warning: 1 of 1 download jobs failed. For details, run 'ais show job dnl-zIaAhNr25 -v'BarFlag)s

> ais show job dnl-zIaAhNr25 -v
Done: 0 files downloaded, error: 1
Errors:
        00000.tar: nil error w/ bad status

https://github.com/NVIDIA/aistore/blob/master/docs/cli/download.md#start-download-job

The download function in the document is basically giving this error.

yingca1 commented 1 year ago

Can ais start download only download the public gs:// datasets?

alex-aizman commented 1 year ago

As aside, I wonder what bad status is?...

The primary motivation behind downloader was downloading raw http sources. But here you have two regular buckets where we can use ours or vendor-documented APIs, etc. And so the first thing that comes to mind is something like:

ais cp gs://ais_video ais://ais_video --template="dataset1/{00000..20000}.tar"

See ais cp --help for details.

Secondly, logs will provide more information (including "bad status"). Especially if the downloader (module) is configured for verbose logging:

$ ais config cluster log.modules downloader
$ ais cluster download-logs

PS. generally, it'll help us if you use the latest bits from github master and send us the logs. If not, then at least ais show cluster

alex-aizman commented 1 year ago

PPS: quick experiment with downloader and a google bucket:


$ ais start download "gs://abcdef-imagenet/imagenet-val-{000037..000045}.tar" ais://dst
Warning: destination bucket ais://dst doesn't exist. Bucket with default properties will be created.
Started download job dnl-KxNDoJt9L
To monitor the progress, run 'ais show job dnl-KxNDoJt9L --progress'

$ ais show job dnl-KxNDoJt9L --progress
Files downloaded:                     0/9 [--------------------------------------------------------------] 0 %
imagenet-val-000043.tar  26.6MiB/132.8MiB [===========>--------------------------------------------------| 00:00:00 ]    0.0 b/s
imagenet-val-000038.tar 116.5MiB/129.6MiB [=======================================================>------| 00:00:00 ]    0.0 b/s
imagenet-val-000040.tar  87.3MiB/128.0MiB [=========================================>--------------------| 00:00:00 ]    0.0 b/s
...
imagenet-val-000043.tar 132.8MiB/132.8MiB [==============================================================| 00:00:00 ]    0.0 b/s
All files successfully downloaded.
alex-aizman commented 1 year ago

addressed in 2628029e3cc014723769f7529b94ae0bc02fcfb6

yingca1 commented 1 year ago

PPS: quick experiment with downloader and a google bucket:

$ ais start download "gs://abcdef-imagenet/imagenet-val-{000037..000045}.tar" ais://dst
Warning: destination bucket ais://dst doesn't exist. Bucket with default properties will be created.
Started download job dnl-KxNDoJt9L
To monitor the progress, run 'ais show job dnl-KxNDoJt9L --progress'

$ ais show job dnl-KxNDoJt9L --progress
Files downloaded:                     0/9 [--------------------------------------------------------------] 0 %
imagenet-val-000043.tar  26.6MiB/132.8MiB [===========>--------------------------------------------------| 00:00:00 ]    0.0 b/s
imagenet-val-000038.tar 116.5MiB/129.6MiB [=======================================================>------| 00:00:00 ]    0.0 b/s
imagenet-val-000040.tar  87.3MiB/128.0MiB [=========================================>--------------------| 00:00:00 ]    0.0 b/s
...
imagenet-val-000043.tar 132.8MiB/132.8MiB [==============================================================| 00:00:00 ]    0.0 b/s
All files successfully downloaded.
  1. do you use aistorage/aisnode:3.18?
  2. ais cluster download-logs this command cannot work
alex-aizman commented 1 year ago

do you use aistorage/aisnode:3.18

aisnode docker image is usually somewhat behind. I triggered rebuild and push - it'll show up in a few minutes.

ais cluster download-logs

Assuming, you cloned or go get https://github.com/NVIDIA/aistore, run make cli from the root. It'll work.

yingca1 commented 1 year ago
ais bucket props set ais://lpr-vision-copy backend_bck=gcp://lpr-vision

ais start download gs://lpr-vision/dir/prefix- ais://lpr-vision-copy

After some testing, it was found that only an AIS bucket that has been configured with a backend bucket can download files from the configured backend bucket.

alex-aizman commented 1 year ago

reopening

alex-aizman commented 1 year ago

No, it actually works as prescribed. Here's the full story, and notice templated source-to-download below.

$ ais ls
NAME             PRESENT
ais://dst        yes

NAME             PRESENT
gs://imagenet      yes
Total: [GCP bucket: 1] ========

$ ais ls ais://dst
NAME     SIZE

and

$ ais show bucket ais://dst | grep backend
backend_bck.name
backend_bck.provider

and also

$ ais ls gs://imagenet
...
imagenet-val-000039.tar          23.87MiB        no
imagenet-val-000040.tar          22.79MiB        no
imagenet-val-000041.tar          22.40MiB        no
imagenet-val-000042.tar          22.99MiB        no
imagenet-val-000043.tar          24.11MiB        no
imagenet-val-000044.tar          23.31MiB        no
imagenet-val-000045.tar          24.03MiB        no
imagenet-val-000046.tar          24.22MiB        no
imagenet-val-000047.tar          23.12MiB        no
...

Now do it:

$ ais start download "gs://imagenet/imagenet-val-{000042..000045}.tar" ais://dst
Started download job dnl-jkhnNM1xp

and done:

ais ls ais://dst
NAME                     SIZE
imagenet-val-000042.tar  22.99MiB
imagenet-val-000043.tar  24.11MiB
imagenet-val-000044.tar  23.31MiB
imagenet-val-000045.tar  24.03MiB