customize torch/transformers cache directory path

ShuzhouYuan commented 2 years ago

Hello! Since I'm using the server and I don't have the permission of the default cache directory, I always got an error Permission Denied of the default cache directory. Do you have a solution to customize the cache directory like other transformer models using cache_dir='your/cache/path'? I tried this but it seems it's not a parameter of your model lol Thank you very much!

vinid commented 2 years ago

Hi!

Could you tell me where this issue arise?

ShuzhouYuan commented 2 years ago

training_dataset = tp.fit(text_for_contextual=unpreprocessed_corpus, text_for_bow=preprocessed_documents)

vinid commented 2 years ago

Thanks, could you also share the stack trace?

On Tue, Feb 8, 2022, 11:37 ShuzhouYuan @.***> wrote:

training_dataset = tp.fit(text_for_contextual=unpreprocessed_corpus, text_for_bow=preprocessed_documents)

— Reply to this email directly, view it on GitHub https://github.com/MilaNLProc/contextualized-topic-models/issues/105#issuecomment-1032460193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBSS3OTMB2TMMCUP3O7HDU2DW7JANCNFSM5N2AUKFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

ShuzhouYuan commented 2 years ago

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-8-e866ba0b7c0c> in <module>
----> 1 training_dataset = qt.fit(text_for_contextual=unpreprocessed_documents, text_for_bow=preprocessed_documents)

~.local/lib/python3.6/site-packages/contextualized_topic_models/utils/data_preparation.py in fit(self, text_for_contextual, text_for_bow, labels)
     67 
     68         train_bow_embeddings = self.vectorizer.fit_transform(text_for_bow)
---> 69         train_contextualized_embeddings = bert_embeddings_from_list(text_for_contextual, self.contextualized_model)
     70         self.vocab = self.vectorizer.get_feature_names()
     71         self.id2token = {k: v for k, v in zip(range(0, len(self.vocab)), self.vocab)}

~.local/lib/python3.6/site-packages/contextualized_topic_models/utils/data_preparation.py in bert_embeddings_from_list(texts, sbert_model_to_load, batch_size)
     33     Creates SBERT Embeddings from a list
     34     """
---> 35     model = SentenceTransformer(sbert_model_to_load)
     36     return np.array(model.encode(texts, show_progress_bar=True, batch_size=batch_size))
     37 

~.local/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device, cache_folder)
     82                                     library_name='sentence-transformers',
     83                                     library_version=__version__,
---> 84                                     ignore_files=['flax_model.msgpack', 'rust_model.ot', 'tf_model.h5'])
     85 
     86             if os.path.exists(os.path.join(model_path, 'modules.json')):    #Load as SentenceTransformer model

~.local/lib/python3.6/site-packages/sentence_transformers/util.py in snapshot_download(repo_id, revision, cache_dir, library_name, library_version, user_agent, ignore_files)
    450             os.path.join(storage_folder, relative_filepath)
    451         )
--> 452         os.makedirs(nested_dirname, exist_ok=True)
    453 
    454         path = cached_download(

~usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
    208     if head and tail and not path.exists(head):
    209         try:
--> 210             makedirs(head, mode, exist_ok)
    211         except FileExistsError:
    212             # Defeats race condition when another thread created the path

~usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
    208     if head and tail and not path.exists(head):
    209         try:
--> 210             makedirs(head, mode, exist_ok)
    211         except FileExistsError:
    212             # Defeats race condition when another thread created the path

~usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
    208     if head and tail and not path.exists(head):
    209         try:
--> 210             makedirs(head, mode, exist_ok)
    211         except FileExistsError:
    212             # Defeats race condition when another thread created the path

~usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
    218             return
    219     try:
--> 220         mkdir(name, mode)
    221     except OSError:
    222         # Cannot rely on checking for EEXIST, since the operating system

PermissionError: [Errno 13] Permission denied: '/.cache'

I think the problem is that I don't have permission to write the default cache directory on the server. I had the same errors before with transformer models, what I did is customizing the cache directory by:

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', cache_dir='your/cache/directory')

Is there maybe somewhere I can also change the path of the cache directory? Thanks!

ShuzhouYuan commented 2 years ago

I've found a solution:

import os
os.environ['TORCH_HOME'] = 'your/cache/path'

vinid commented 2 years ago

Wow! nice! :)

happy you solved the problem :)

MilaNLProc / contextualized-topic-models

customize torch/transformers cache directory path #105