Closed vrmer closed 2 years ago
Hi @vrmer, we used transformers==2.3.0
as far as I am aware. Did you try running the install_tools.sh
script? This should install the correct transformer version (see this line).
Thanks for the response! I ran into issues running the install_tools.sh
script when I first started using the library but I don't have the output for that at the moment.
Nevertheless, I followed the lines you pointed at and install transformers==2.3.0
. However, I still get the following errors:
Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 57, in <module> "xlmr": (XLMRobertaConfig, XLMRobertaModel, XLMRobertaTokenizer), NameError: name 'XLMRobertaTokenizer' is not defined
Hmm, that's strange. I just checked the HuggingFace Transformers repo and XLMRobertaTokenizer
should be available in v2.3.0
(see here)? Could you double check that you have the correct version and if the file tokenization_xlm_roberta.py
is available in the transformers version you are using?
Apologies, apparently I uncommented the import statement when I was trying to make the code run and forgot to put it back!
Now the code starts running with this message:
03/16/2022 15:57:00 - INFO - root - Input args: Namespace(batch_size=100, cache_dir='', candidate_prefix='candidates', concate_layers=False, config_name='', data_dir='/Users/marcellfekete/PycharmProjects/xtreme/download//tatoeba/', dist='cosine', do_lower_case=False, embed_size=768, encoding='utf-8', extract_embeds=False, gold=None, init_checkpoint=None, local_rank=-1, log_file='embed-cosine', max_answer_length=92, max_query_length=64, max_seq_length=512, mine_bitext=False, model_name_or_path='/mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16', model_type='bert', no_cuda=False, num_layers=12, output_dir='/Users/marcellfekete/PycharmProjects/xtreme/outputs-temp//tatoeba//mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16_512/', overwrite_cache=False, overwrite_output_dir=False, pool_skip_special_token=False, pool_type='mean', predict_dir=None, specific_layer=7, split='training', src_embed_file=None, src_file=None, src_id_file=None, src_language='ar', src_text_file=None, src_tok_file=None, task_name='tatoeba', tgt_embed_file=None, tgt_file=None, tgt_id_file=None, tgt_language='en', tgt_text_file=None, tgt_tok_file=None, threshold=-1, tokenizer_name='', unify=False, use_shift_embeds=False)
But then it gives me this error message:
Traceback (most recent call last):
File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 204, in get_config_dict
raise EnvironmentError
OSError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 748, in <module>
main()
File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 733, in main
all_src_embeds = extract_embeddings(args, src_text_file, src_tok_file, None, lang=src_lang2, pool_type=args.pool_type)
File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 173, in extract_embeddings
config, model, tokenizer, langid = load_model(args, lang,
File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 150, in load_model
config = config_class.from_pretrained(args.model_name_or_path)
File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 160, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 220, in get_config_dict
raise EnvironmentError(msg)
OSError: Model name '/mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16' was not found in model name list. We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert//mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16/config.json' was a path, a model identifier, or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
I'm not even sure why it is trying to use XLM-RoBERTa when I explicitly tried using multilingual BERT.
I assume you're running the run_tatoeba.sh
script? We are now recommending to use a model fine-tuned on SQuAD for retrieval, rather than using the representations of the pre-trained model directly.
In the run_tatoeba.sh
script, you can replace the path to the fine-tuned model here. If you prefer not to use a fine-tuned model, you can simply uncomment that line and things should run as expected.
Edit: Running scripts/train.sh "bert-base-multilingual-cased" tatoeba
calls the run_tatoeba.sh
script.
Oh thank you, that was actually really helpful! Now the code seems to be running without issues.
I am closing the issue because it has been sorted.
I installed all the necessary dependencies and tried running the Tatoeba task using
bash scripts/train.sh "bert-base-multilingual-cased" tatoeba
.However, I immediately ran into an ImportError:
Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 39, in <module> from bert import BertForRetrieval File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/bert.py", line 4, in <module> from transformers.modeling_bert import BertModel, BertPreTrainedModel ModuleNotFoundError: No module named 'transformers.modeling_bert'
This is with transformers-4.17.0.
I tried downgrading transformers to version 3.5 and 2.0 but I am running into other issues then.
Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 43, in <module> from xlm_roberta import XLMRobertaConfig, XLMRobertaForRetrieval, XLMRobertaModel File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/xlm_roberta.py", line 24, in <module> from roberta import ( File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/roberta.py", line 27, in <module> from transformers.modeling_bert import BertEmbeddings, BertLayerNorm, BertModel, BertPreTrainedModel, gelu ImportError: cannot import name 'BertLayerNorm' from 'transformers.modeling_bert' (/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/modeling_bert.py)
This is with transformers 3.5.
Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 57, in <module> "xlmr": (XLMRobertaConfig, XLMRobertaModel, XLMRobertaTokenizer), NameError: name 'XLMRobertaTokenizer' is not defined
This is with transformers 2.0.
Do you have any advice? Which transformers version is recommended to run the tests?
I don't know if it matters but I am trying to run on Apple Silicon using the Rosetta layer (due to
faiss
not installing natively).Thank you!