UChicago-Computational-Content-Analysis / Frequently-Asked-Questions

0 stars 0 forks source link

Issue running BERT #19

Open facundosuenzo opened 2 years ago

facundosuenzo commented 2 years ago

Hi,

I'm having issues running the script of RoBERTa (for the US dataset)

I ran this line

!python run_language_modeling.py --output_dir=output_roberta_US --model_type=roberta --model_name_or_path=roberta-base --do_train --train_data_file=us_blog_train --do_eval --eval_data_file=us_blog_test --mlm

And I've got the following error. There is an issue with one of the arguments.

(cut output) Traceback (most recent call last): File "run_language_modeling.py", line 545, in <module> main() File "run_language_modeling.py", line 497, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer) File "run_language_modeling.py", line 228, in train outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() got an unexpected keyword argument 'masked_lm_labels'

Then, after exercise 3, when visualizations are introduced, the following function word_vectordoes not seem to be defined (and it is used within the visualise_diffs

Thanks in advance for your help!

JunsolKim commented 2 years ago

Hi @facundosuenzo, could you share your entire code (notebook) through GitHub or email? Also, what is the version of torch (run torch.__version__) and environment (e.g., colab) that you use?

jacyanthis commented 2 years ago

Hi @facundosuenzo, could you share your entire code (notebook) through GitHub or email? Also, what is the version of torch (run torch.__version__) and environment (e.g., colab) that you use?

Yeah, unexpected keyword argument is usually a version issue if you're using someone else's code. These codebases are changing rapidly, and people deprecate and replace argument names (unfortunately) very frequently. Our aim is to make all the notebooks run on the latest stable version of each package, though we don't always succeed!

Then, after exercise 3, when visualizations are introduced, the following function word_vector does not seem to be defined (and it is used within the visualise_diffs

Whoops! That function is defined in Homework 7, but I forgot to copy it into Homework 8 during recent edits. I've added it now, and here is the code. I will test it asap.

def word_vector(text, word_id, model, tokenizer):
    marked_text = "[CLS] " + text + " [SEP]"
    tokenized_text = tokenizer.tokenize(marked_text)
    indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
    tokens_tensor = torch.tensor([indexed_tokens])
    word_embeddings = model(tokens_tensor)[0]
    sentence_embeddings = model(tokens_tensor)[1]
    vector = word_embeddings[0][word_id].detach().numpy()
    return vector
facundosuenzo commented 2 years ago

Thank you both @jacyanthis and @JunsolKim!

I'm using colab with this torch version 1.10.0+cu111 and I'm sending my code by email too.

So if it's deprecated, does it mean that I won't be able to run BERT, or is there any workaround to this?

Re: word_vector thank you!!

jacyanthis commented 2 years ago

For others reading this, the notebook should now work with the latest torch version (which Colab loads automatically) because we have split run_language_modeling.py into two files, one to work with GPT-2 (run_language_modeling_gpt.py) and one to work with RoBERTa (run_language_modeling_roberta.py).