allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.77k stars 2.25k forks source link

Set different Cache Directory for the Predictor.from_path api #5548

Closed ytiam closed 2 years ago

ytiam commented 2 years ago

Hi all, I am using Dataiku platform for my project developement and there I need allennlp in my pipeline. But while using the Predictor.from_path api, I am basically facing a Permission Denied issue, as Dataiku is not allowing to create the CACHE_ROOT directory ".allennlp" under its root folder. Please see the below error.

PermissionError Traceback (most recent call last)

in ----> 1 predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bidaf-elmo.2021-02-11.tar.gz") ~/code-env/lib/python3.7/site-packages/allennlp/predictors/predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs) 364 plugins.import_plugins() 365 return Predictor.from_archive( --> 366 load_archive(archive_path, cuda_device=cuda_device, overrides=overrides), 367 predictor_name, 368 dataset_reader_to_load=dataset_reader_to_load, ~/code-env/lib/python3.7/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file) 204 """ 205 # redirect to the cache, if necessary --> 206 resolved_archive_file = cached_path(archive_file) 207 208 if resolved_archive_file == archive_file: ~/code-env/lib/python3.7/site-packages/allennlp/common/file_utils.py in cached_path(url_or_filename, cache_dir, extract_archive, force_extract) 135 cache_dir=cache_dir or CACHE_DIRECTORY, 136 extract_archive=extract_archive, --> 137 force_extract=force_extract, 138 ) 139 ~/code-env/lib/python3.7/site-packages/cached_path/_cached_path.py in cached_path(url_or_filename, cache_dir, extract_archive, force_extract) 119 cache_dir = cache_dir if cache_dir else get_cache_dir() 120 cache_dir = os.path.expanduser(cache_dir) --> 121 os.makedirs(cache_dir, exist_ok=True) 122 123 if not isinstance(url_or_filename, str): ~/code-env/lib/python3.7/os.py in makedirs(name, mode, exist_ok) 211 if head and tail and not path.exists(head): 212 try: --> 213 makedirs(head, exist_ok=exist_ok) 214 except FileExistsError: 215 # Defeats race condition when another thread created the path ~/code-env/lib/python3.7/os.py in makedirs(name, mode, exist_ok) 221 return 222 try: --> 223 mkdir(name, mode) 224 except OSError: 225 # Cannot rely on checking for EEXIST, since the operating system PermissionError: [Errno 13] Permission denied: '/opt/dataiku/.allennlp' ----------------------------------------------------------------------------- So my question is, instead of the root folder, if I want to set any other folder as the CACHE_ROOT folder, by declaring it through Predictor.from_path api, then how should I do that? Please help me.
AkshitaB commented 2 years ago

@ytiam You can override the default cache directory by setting ALLENNLP_CACHE_ROOT in your environment to a different path.

ytiam commented 2 years ago

Thanks @AkshitaB , basically I need to do the entire operation staying inside a Notebook only and no command line access for me as an user, as per organizational security constraints. Can you please help me with the settings to do the same from a notebook? Thanks again

ytiam commented 2 years ago

@AkshitaB I just ran %env ALLENNLP_CACHE_ROOT=/home/dataiku/allennlp_model_path from a notebook cell. and then I checked, os.getenv("ALLENNLP_CACHE_ROOT") and it is giving the path as "/home/dataiku/allennlp_model_path"

But still I am getting the same error for the Predictor.from_path api.

AkshitaB commented 2 years ago

@ytiam So, it still takes the older, default directory path? Can you confirm that you set the environment variable before importing any allennlp modules?

ytiam commented 2 years ago

@AkshitaB as you suggested, I did the env path setting before importing the module and in that way it worked. Thanks for all the help.

AkshitaB commented 2 years ago

@ytiam Great!