float_mask.repeat(1, 1, repeat_size, 1) causes RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

tschomacker commented 2 years ago

I am trying to fine-tune my own longmbart on text simplification. But I am little stucked. Conversion worked but I got an Error when starting to fine-tune. I would really appreciate any hints on how to fix the problem.

What I did previously:

pip install -q -r requirements.txt

converted the model:

python ./scripts/convert_mbart_to_longformerencoderdecoder.py \
--save_model_to ./output/converted-longmbart \
--attention_window 512 \
--cache_dir ./output/mbart-large-cc25 \
--base_model facebook/mbart-large-cc25 \
--tokenizer_name_or_path facebook/mbart-large-cc25\
--add_language_tags de_OR de_SI \
--initialize_tags de_DE de_DE \
--max_pos 1024 \
--verbose 1

started the fine-tuning:

python -m longformer.simplification \
--from_pretrained ./output/converted-longmbart \
--tokenizer ./output/converted-longmbart \
--save_dir ./output/longmbart-fine-tuned \
--save_prefix "w512" \
--train_source ./data/train-source.txt \
--train_target ./data/train-target.txt \
--val_source ./data/val-source.txt \
--val_target ./data/val-target.txt \
--test_source ./data/test-source.txt \
--test_target ./data/test-target.txt \
--max_output_len 1024 \
--max_input_len 1024 \
--batch_size 1 \
--grad_accum 60 \
--num_workers 5 \
--gpus 1 \
--seed 222 \
--attention_dropout 0.1 \
--dropout 0.3 \
--attention_mode sliding_chunks \
--attention_window 512 \
--label_smoothing 0.2 \
--lr 0.00003 \
--val_every 1.0 \
--val_percent_check 1.0 \
--test_percent_check 1.0 \
--early_stopping_metric 'rougeL' \
--patience 10 \
--lr_reduce_patience 8 \
--lr_reduce_factor 0.5 \
--grad_ckpt \
--progress_bar_refresh_rate 10 \
--tags_included

This threw the following RuntimeError:

Current Bevior: RuntimeError

Epoch 0:   0%|                                            | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/jovyan/git/longmbart/longformer/simplification.py", line 527, in <module>
main(args)
File "/home/jovyan/git/longmbart/longformer/simplification.py", line 518, in main
trainer.fit(model)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
results = self.accelerator_backend.train()
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 158, in train
results = self.ddp_train(process_idx=self.task_idx, model=model)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 307, in ddp_train
results = self.train_or_test()
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train
self.train_loop.run_training_epoch()
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 549, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 704, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 482, in optimizer_step
model_ref.optimizer_step(
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1296, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 286, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 140, in __optimizer_step
trainer.precision_connector.backend.optimizer_step(trainer, optimizer, closure)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/plugins/native_amp.py", line 75, in optimizer_step
closure()
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 694, in train_step_and_backward_closure
result = self.training_step_and_backward(
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 792, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 316, in training_step
training_step_output = self.trainer.accelerator_backend.training_step(args)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 164, in training_step
return self._step(args)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 176, in _step
output = self.trainer.model(*args)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/pytorch_lightning/overrides/data_parallel.py", line 179, in forward
output = self.module.training_step(*inputs[0], **kwargs[0])
File "/home/jovyan/git/longmbart/longformer/simplification.py", line 251, in training_step
output = self.forward(*batch)
File "/home/jovyan/git/longmbart/longformer/simplification.py", line 231, in forward
outputs = self.model(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/modeling_mbart.py", line 1346, in forward
outputs = self.model(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/modeling_mbart.py", line 1211, in forward
encoder_outputs = self.encoder(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/modeling_mbart.py", line 840, in forward
layer_outputs = encoder_layer(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/modeling_mbart.py", line 331, in forward
hidden_states, attn_weights, _ = self.self_attn(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jovyan/git/longmbart/longformer/longformer_encoder_decoder.py", line 66, in forward
outputs = self.longformer_self_attn(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jovyan/git/longmbart/longformer/longformer.py", line 184, in forward
float_mask = float_mask.repeat(1, 1, repeat_size, 1)
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

I have checked float_mask and its size: torch.Size([1, 1, 1024, 1024, 1, 1]). Which looks odd to me

a-rios commented 2 years ago

Hi, the shape should be ([1, 1024, 1,1]). I cannot reproduce this, could you send me a minimal sample of your data/script that produces this error? and just to make sure, the transformers version you have installed, is it the one linked in the requirements file (not the default huggingface code, we had make some changes to mbart).

tschomacker commented 2 years ago

Hi, thanks for the very quick response. I have actually changed the requirements and installed the 'normal' transformers package. I changed it because running the conversion (same call as aboce) with transformers @ git+https://github.com/ZurichNLP/transformers.git@longmbart#egg=transformers installed, resulted in:

Traceback (most recent call last):
  File "/home/jovyan/git/longmbart/./scripts/convert_mbart_to_longformerencoderdecoder.py", line 11, in <module>
    from transformers import MBartForConditionalGeneration
  File "/opt/conda/lib/python3.9/site-packages/transformers/__init__.py", line 2162, in __getattr__
    return super().__getattr__(name)
  File "/opt/conda/lib/python3.9/site-packages/transformers/file_utils.py", line 1479, in __getattr__
    value = getattr(module, name)
  File "/opt/conda/lib/python3.9/site-packages/transformers/file_utils.py", line 1478, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/__init__.py", line 89, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/opt/conda/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/mbart/modeling_mbart.py", line 47, in <module>
    from longformer.longformer_encoder_decoder import LongformerSelfAttentionForBart
ModuleNotFoundError: No module named 'longformer'

This issues was resolved after switching to 'normal' transformers.

a-rios commented 2 years ago

ok, longmbart will not run with the standard transformer library, because longmbart uses attention masks with 3 values (0,1,2) instead of the standard (0,1) - this is to distinguish local and global attention. You need the transformers repo linked in requirements. The conversion error looks like your longmbart repo wasn't installed in your python environment, you can do this with (from within the longmbart directory): pip install -e .

a-rios / longmbart

float_mask.repeat(1, 1, repeat_size, 1) causes RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor #13

What I did previously:

Current Bevior: RuntimeError