abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
MIT License
1.05k stars 77 forks source link

Error while evaluating #20

Closed MonliH closed 1 year ago

MonliH commented 1 year ago

Hello, I am running the bart_base_sled code using the contract_nli dataset, with the following arguments:

python run.py configs/training/base_training_args.json configs/model/bart_base_sled.json configs/data/contract_nli.json \
--output_dir checkpoints \
--per_device_train_batch_size 1 \
--test_unlimiformer \
--model_name_or_path facebook/bart-base \
--unlimiformer_training \
--max_source_length 16384 \
--learning_rate 1e-5 \
--eval_max_source_length 16384 \
--do_eval=True  \
--eval_steps 16 \
--save_steps 16 \
--extra_metrics bertscore

(I have low eval_steps for debugging purposes). The training seems to work fine, but after evaluation starts I get an error:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 1 for tensor number 1 in the list.
Full Log ``` ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/aaa/unlimiformer/src/run.py:1213 in │ │ │ │ 1210 │ │ 1211 │ │ 1212 if __name__ == "__main__": │ │ ❱ 1213 │ main() │ │ 1214 │ │ │ │ /home/aaa/unlimiformer/src/run.py:822 in main │ │ │ │ 819 │ │ elif last_checkpoint is not None: │ │ 820 │ │ │ checkpoint = last_checkpoint # look for checkpoints in the outdir │ │ 821 │ │ │ │ ❱ 822 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) │ │ 823 │ │ logger.info('Done training') │ │ 824 │ │ trainer.save_model() # Saves the tokenizer too for easy upload │ │ 825 │ │ │ │ /home/aaa/anaconda3/envs/hf-latest/lib/python3.10/site-packages/transformers/trainer.py:1521 in │ │ train │ │ │ │ 1518 │ │ inner_training_loop = find_executable_batch_size( │ │ 1519 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │ │ 1520 │ │ ) │ │ ❱ 1521 │ │ return inner_training_loop( │ │ 1522 │ │ │ args=args, │ │ 1523 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 1524 │ │ │ trial=trial, │ │ │ │ /home/aaa/anaconda3/envs/hf-latest/lib/python3.10/site-packages/transformers/trainer.py:1840 in │ │ _inner_training_loop │ │ │ │ 1837 │ │ │ │ │ self.state.epoch = epoch + (step + 1) / steps_in_epoch │ │ 1838 │ │ │ │ │ self.control = self.callback_handler.on_step_end(args, self.state, s │ │ 1839 │ │ │ │ │ │ │ ❱ 1840 │ │ │ │ │ self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_k │ │ 1841 │ │ │ │ else: │ │ 1842 │ │ │ │ │ self.control = self.callback_handler.on_substep_end(args, self.state │ │ 1843 │ │ │ │ /home/aaa/anaconda3/envs/hf-latest/lib/python3.10/site-packages/transformers/trainer.py:2065 in │ │ _maybe_log_save_evaluate │ │ │ │ 2062 │ │ │ │ 2063 │ │ metrics = None │ | 2064 │ │ if self.control.should_evaluate: │ │ ❱ 2065 │ │ │ metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) │ │ 2066 │ │ │ self._report_to_hp_search(trial, self.state.global_step, metrics) │ │ 2067 │ │ │ │ 2068 │ │ if self.control.should_save: │ │ │ │ /home/aaa/unlimiformer/src/utils/custom_seq2seq_trainer.py:267 in evaluate │ │ │ │ 264 │ │ │ │ 265 │ │ eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else se │ │ 266 │ │ try: │ │ ❱ 267 │ │ │ output = eval_loop( │ │ 268 │ │ │ │ eval_dataloader, │ │ 269 │ │ │ │ description="Evaluation", │ │ 270 │ │ │ │ # No point gathering the predictions if there are no metrics, otherwise │ │ │ │ /home/aaa/anaconda3/envs/hf-latest/lib/python3.10/site-packages/transformers/trainer.py:2965 in │ │ evaluation_loop │ │ │ │ 2962 │ │ │ │ │ batch_size = observed_batch_size │ │ 2963 │ │ │ │ │ 2964 │ │ │ # Prediction step │ │ ❱ 2965 │ │ │ loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_o │ │ 2966 │ │ │ inputs_decode = self._prepare_input(inputs["input_ids"]) if args.include_inp │ │ 2967 │ │ │ │ │ 2968 │ │ │ if is_torch_tpu_available(): │ │ │ │ /home/aaa/unlimiformer/src/utils/custom_seq2seq_trainer.py:140 in prediction_step │ │ │ │ 137 │ │ if has_labels: # changed the order of the if's here because there is no point g │ │ 138 │ │ │ with torch.no_grad(): │ │ 139 │ │ │ │ with self.compute_loss_context_manager(): │ │ ❱ 140 │ │ │ │ │ outputs = model(**inputs) │ │ 141 │ │ │ │ │ if self.label_smoother is not None: │ │ 142 │ │ │ │ │ │ loss = self.label_smoother(outputs, inputs["labels"]).mean().det │ │ 143 │ │ │ │ │ else: │ │ │ │ /home/aaa/anaconda3/envs/hf-latest/lib/python3.10/site-packages/torch/nn/modules/module.py:1130 │ │ in _call_impl │ │ │ │ 1127 │ │ # this function, and just call forward. │ │ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │ │ 1131 │ │ # Do not call functions when jit is used │ │ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/aaa/unlimiformer/src/unlimiformer.py:499 in pre_forward_hook │ │ │ │ 496 │ │ │ │ if input_ids is not None: │ │ 497 │ │ │ │ │ self.input_ids = torch.cat([self.input_ids, input_ids[0]]) │ │ 498 │ │ │ │ if kwargs.get('decoder_input_ids') is not None: │ │ ❱ 499 │ │ │ │ │ self.generated_input_ids = torch.cat([self.generated_input_ids, kwar │ │ 500 │ │ │ │ 501 │ │ result = self.original_forward_func(input_ids=input_ids, labels=labels, attentio │ │ 502 │ │ self.is_first_test_decoding_step = False │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 1 for tensor number 1 in the list. ```
urialon commented 1 year ago

Hi @MonliH , Thank you for your interest in our work!

Sorry, our current implementation supports only seq2seq tasks (such as summarization) and does not support QA datasets, because they require encoding the question with each of the input chunks, which is slightly different.

Best, Uri

urialon commented 1 year ago

Hi @MonliH ,

Running with bart_base_sled.json activates the SLED model, which we based our code on, but is an unrelated approach. We recently found that you can get significant improvements on QA tasks if you concatenate the question + document and feed them as the input. We will soon update the results in the paper as well.

I'm closing this issue, but feel free to re-open if you have any questions!

Best, Uri