ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
I am unable to use StorageManager to download and cache data from mounted NFS storage.
My use case:
I have a lot of data stored on quite slow NFS storage mounted under /mnt/xyz,
I am using NFS to store datasets, because I manage them with our in-house tools and need them accessible on a per-file basis (I can't use Clearml Datasets because it stores files in chunks)
I would like to leverage local dataset caching by using StorageManager.download_folder however it doesn't seem to download anything, even though it returns path to local cache where the files should be downloaded.
When I use StorageManager.download_files() it just returns-back the NFS path, because it thinks the files are local and it skips download.
To reproduce
Remove/comment the line: { url: "file://*" } # file-urls are always directly referenced in my clearml.conf under sdk.storage.direct_access
Open python terminal and try to download the directory (it has 700MiB):
from clearml import StorageManager
StorageManager.download_folder("/mnt/xyz/dataset_y")
download_folder() will return my local cache path `~/.clearml/cache/storage_manager/global but no data is there, nothing was downloaded.
Expected behaviour
I expected the files to be copied from NFS share and locally cached.
Environment
Server type - self hosted
ClearML SDK Version - 1.14.4
ClearML Server Version - 1.15.0-472
Python Version - 3.11.8
OS - Linux (ubuntu 22.04)
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.
Describe the bug
I am unable to use StorageManager to download and cache data from mounted NFS storage.
My use case: I have a lot of data stored on quite slow NFS storage mounted under
/mnt/xyz
, I am using NFS to store datasets, because I manage them with our in-house tools and need them accessible on a per-file basis (I can't use Clearml Datasets because it stores files in chunks)I would like to leverage local dataset caching by using
StorageManager.download_folder
however it doesn't seem to download anything, even though it returns path to local cache where the files should be downloaded.When I use
StorageManager.download_files()
it just returns-back the NFS path, because it thinks the files are local and it skips download.To reproduce
{ url: "file://*" } # file-urls are always directly referenced
in my clearml.conf undersdk.storage.direct_access
download_folder()
will return my local cache path `~/.clearml/cache/storage_manager/global but no data is there, nothing was downloaded.Expected behaviour
I expected the files to be copied from NFS share and locally cached.
Environment
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.