Open isaac-chung opened 6 days ago
An option in the CLI might simply be to do:
mteb run ... --disable-datasets-caching
Using the following:
from datasets import disable_caching
disable_caching()
We might additionally add the arguments:
eval = mteb.MTEB(...)
eval.run(..., automatically_clean_up_cache=True) # on or off by default? On would be more stable but also more invasive
Which will automatically clean up if there is not enough space
Would go for an option in the CLI also!
When running all retrieval tasks, a machine can easily run out of disk space, as loading a dataset stores the dataset files in a cache directory (usually
~/.cache/huggingface/datasets
). e.g.Suggestion
evaluate
to call the dataset'scleanup_cache_files
method, or__exit__()
(callcleanup_cache_files
) forAbsTask
to be able to use the task as a context managerCC @imenelydiaker (related to the script we have)