ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
When the cache_file_limit is set to a large value, e.g. 10k, calls to StorageManager.get_local_copy gets extremely slow, even if all the files are already available in the cache.
By profiling, it seems that this call to iterdir() is the main bottleneck. If there are a lot of small files in cache, and get_local_copy is called for each of them, iterating over all the files on each call is too slow.
To reproduce
Set StorageManager.set_cache_file_limit(10_000)
Download multiple files with StorageManager.get_local_copy to fill up the cache
Run again
Expected behaviour
If all the files are already available in cache, the second run should almost be immediate. Instead it can take minutes.
Since iterating over the files seems to be needed only for deleting old files if the cache is full, maybe there could be a parameter to disable this logic and another method to trigger it manually.
Describe the bug
When the
cache_file_limit
is set to a large value, e.g. 10k, calls toStorageManager.get_local_copy
gets extremely slow, even if all the files are already available in the cache.By profiling, it seems that this call to
iterdir()
is the main bottleneck. If there are a lot of small files in cache, andget_local_copy
is called for each of them, iterating over all the files on each call is too slow.To reproduce
StorageManager.set_cache_file_limit(10_000)
StorageManager.get_local_copy
to fill up the cacheExpected behaviour
If all the files are already available in cache, the second run should almost be immediate. Instead it can take minutes.
Since iterating over the files seems to be needed only for deleting old files if the cache is full, maybe there could be a parameter to disable this logic and another method to trigger it manually.
Environment