Closed yingca1 closed 1 year ago
It seems like ais etl
can only fetch files that have already been cached in the bucket
Hey @yingca1!
can you directly try processing from the remote bucket? e.g. ais etl bucket transformer-etl <etl-name> gs://dataset_raw ais://dst
.
Also, check if there are any logs/errors from your previous transformation - ais etl logs <etl-name>
ais ls gs://dataset_raw
NAME SIZE CACHED
00001.tar 139.97MiB yes
00002.tar 150.64MiB no
00003.tar 155.62MiB no
00004.tar 164.40MiB no
00005.tar 148.05MiB no
00006.tar 167.46MiB no
00007.tar 155.25MiB no
00008.tar 148.16MiB no
00009.tar 148.16MiB no
00010.tar 174.16MiB no
00011.tar 157.26MiB no
00012.tar 126.54MiB no
00013.tar 140.54MiB no
00014.tar 166.49MiB no
00015.tar 155.99MiB no
00016.tar 139.45MiB no
00017.tar 166.50MiB no
00018.tar 141.96MiB no
00019.tar 143.59MiB no
00020.tar 150.62MiB no
00021.tar 152.93MiB no
00022.tar 152.81MiB no
00023.tar 128.04MiB no
00024.tar 146.64MiB no
00025.tar 157.93MiB no
00026.tar 150.59MiB no
00027.tar 136.73MiB no
00028.tar 151.62MiB no
00029.tar 151.45MiB no
00030.tar 156.35MiB no
00031.tar 138.45MiB no
00032.tar 136.20MiB no
00033.tar 143.92MiB no
00034.tar 159.80MiB no
00035.tar 134.40MiB no
00036.tar 177.63MiB no
00037.tar 151.78MiB no
00038.tar 153.73MiB no
00039.tar 160.19MiB no
00040.tar 139.48MiB no
00041.tar 136.02MiB no
00042.tar 150.70MiB no
00043.tar 131.01MiB no
00044.tar 140.57MiB no
00045.tar 151.36MiB no
00046.tar 153.03MiB no
00047.tar 142.15MiB no
00048.tar 149.41MiB no
00049.tar 138.68MiB no
00050.tar 157.70MiB no
00051.tar 135.21MiB no
00052.tar 157.94MiB no
00053.tar 148.85MiB no
00054.tar 165.08MiB no
00055.tar 146.65MiB no
00056.tar 159.91MiB no
00057.tar 123.22MiB no
00058.tar 139.02MiB no
00059.tar 153.07MiB no
00060.tar 150.39MiB no
00061.tar 141.47MiB no
00062.tar 162.76MiB no
00063.tar 137.81MiB no
00064.tar 144.43MiB no
00065.tar 165.58MiB no
00066.tar 148.15MiB no
00067.tar 144.23MiB no
00068.tar 151.54MiB no
00069.tar 151.61MiB no
00070.tar 146.01MiB no
00071.tar 134.46MiB no
00072.tar 145.56MiB no
00073.tar 137.06MiB no
00074.tar 144.52MiB no
00075.tar 151.15MiB no
00076.tar 146.14MiB no
00077.tar 136.53MiB no
00078.tar 145.85MiB no
00079.tar 149.72MiB no
00080.tar 146.18MiB no
00081.tar 150.58MiB no
00082.tar 164.97MiB no
00083.tar 145.10MiB no
00084.tar 145.37MiB no
00085.tar 141.30MiB no
00086.tar 143.17MiB no
00087.tar 143.07MiB no
00088.tar 139.75MiB no
00089.tar 155.99MiB no
00090.tar 151.40MiB no
00091.tar 142.84MiB no
00092.tar 189.78MiB no
00093.tar 190.97MiB no
00094.tar 192.47MiB no
00095.tar 183.54MiB no
00096.tar 207.72MiB no
00097.tar 209.75MiB no
00098.tar 214.41MiB no
00099.tar 208.27MiB no
00100.tar 176.06MiB no
ais etl bucket transformer-etl gs://dataset_raw ais://out1
ais ls ais://out1
NAME SIZE
00001.tar 75.78MiB
@gaikwadabhishek The result looks the same when directly reading files from GCS.
update:
ais start download --sync gs://dataset_raw ais://data_bucket_1
https://github.com/NVIDIA/aistore/blob/e13261ec748d80d2fbea8797ce5974e2e6f325e1/docs/downloader.md#exampleThis blog might help you in your current task (answers 1 and 3). For 2, I suppose we don't have it through aistore yet but you can maintain the files through scripts?
Is there any way to quickly cache all the files in the bucket?
"cache" and "files" may mean different things in different circumstances. But if that's what I think it is then - some easy CLI pointers:
$ ais etl bucket --help | grep all
--all transform all objects from a remote bucket including those that are not present (not "cached") in the cluster
and specifically for files (not objects):
$ ais object promote --help
and also:
$ ais start download --help
and more:
$ ais start prefetch --help
Those are some of the supported ways. But the most popular way is - just start running.
ais ls ais://data_bucket_1
can successfully view the data.ais etl bucket transformer-etl ais://data_bucket_1 ais://data_bucket_1_out
Note: source ais://data_bucket_1 is empty, nothing to do
update:
if I do
ais get ais://data_bucket_1/00081.tar
then
ais etl bucket transformer-etl ais://data_bucket_1 ais://data_bucket_1_out
will successful process
00081
, but can't get the other files to work either.ais ls ais://data_bucket_1_out