Open arashashari opened 4 years ago
Any comment on this?
Creating this checkpoint is a long process that involves a lot of data and pretraining. The notebook you referenced in the closest we have to reproduce this process. That said, the error you are seeing is not related to this. Can you place a breakpoint before the line logits = self.qa_outputs(sequence_output)
to check that the shapes of sequence_output
, self.qa_outputs.weights
, self.qa_outputs.bias
and see which one has a shape different from the expected.
Thanks @ibeltagy, I was able to get around that issue and fixed it. However, performance-wise the result is not good as I am just using a small amount of data based on the notebook while testing on TriviaQA.
{'exact_match': 5.079444513949706, 'f1': 13.769592296107348, 'common': 7993, 'denominator': 7993, 'pred_len': 7993, 'gold_len': 7993}
I wonder if you can share the long process you mentioned. Currently I am doing a comparison with a very similar model which results a better performance. However, being able to re-do the process you have done could change it.
Thanks
The process is described in detail in the paper, but the lack of careful pretraining shouldn't affect the result that much. Check table 11 Longformer (no MLM pretraining) --> 73.2
. An F1 score of 13.7 indicates a bug not a lack of enough pretraining. My guess is that the TriviaQA script is starting from a randomly initialized model not from the pretrained one, maybe because of a wrong path or something.
Thanks @ibeltagy I kind of agree that there is an issue with model load for the QA part. Here is the config I get at the end from the notebook: { "architectures": [ "RobertaLongForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "attention_window": [ 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512 ], "bos_token_id": 0, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "type_vocab_size": 1, "vocab_size": 50265 } Even when I reload the saved model and evaluate it for MLM, it gives me the correct bpc (1.635 same as value I get before saving)
I can see that in your TriviaQA code you load the Longformer model. That has a different config. But based on the paper, that model is pretrained from an MLM; which I expect to be similar to the notebook. I have attached the TriviaQA triviaqa_lf.txt script I have updated to load the MLM model. I wonder if you can have a quick look and comment on what may cause issue in loading the model. (only difference is using the new load model - few lines change)
Meanwhile, the code based on the data you have provided works fine. However, I am trying to repro the pretrained model.
Hello, I wonder if you have the code that generated this pre-trained model. (or longformer-large-4096-finetuned-triviaqa)
I followed the steps in your notebook and updated the triviaqa.py file; it is attached.
However, I get some miss match dimension in linear operations as follow: Traceback (most recent call last): File "/home/arashari/longformer_original/scripts/triviaqa.py", line 760, in
main(args)
File "/home/arashari/longformer_original/scripts/triviaqa.py", line 752, in main
trainer.fit(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 979, in fit
self.single_gpu_train(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/distrib_parts.py", line 185, in single_gpu_train
self.run_pretrain_routine(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1139, in run_pretrain_routine
False)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 293, in _evaluate
output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 485, in evaluation_forward
output = model.validation_step(*args)
File "/home/arashari/longformer_original/scripts/triviaqa.py", line 440, in validation_step
output = self.forward(input_ids, input_mask, segment_ids, subword_starts, subword_ends)
File "/usr/local/lib/python3.6/dist-packages/apex/amp/_initialize.py", line 197, in new_fwd
*applier(kwargs, input_caster))
File "/home/arashari/longformer_original/scripts/triviaqa.py", line 362, in forward
logits = self.qa_outputs(sequence_output)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [1 x 2376], m2: [768 x 2] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
Run time arguments are: python -m triviaqa \ --train_dataset output/squad-wikipedia-train-4096.json \ --dev_dataset squad-wikipedia-dev-4096.json \ --gpus 0,1,2,3,4,5,6,7 --batch_size 1 --num_workers 4 \ --lr 0.00003 --warmup 1000 --epochs 4 --max_seq_len 4096 --doc_stride -1 \ --save_prefix data/output/ \ --seed 4321 \ --attention_mode 'sliding_chunks' \ --model_path roberta-base
Any comment/suggestion on this?
Further, I tried the run_squad from huggingface and seems like the converter (triviaqa to squad) does not add title; it is not compatible with this example. For example it is complaining about fields such as "title". Are you aware of any other converter? triviaqa.txt