facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.34k stars 6.39k forks source link

TypeError: forward() missing 1 required positional argument: 'prev_output_tokens' #4871

Open theamato opened 1 year ago

theamato commented 1 year ago

❓ Questions and Help

What is your question?

When trying to fine tune cross attention parameters for mBART, I get this error in the beginning of epoch 1: TypeError: forward() missing 1 required positional argument: 'prev_output_tokens'. I checked the **kwargs- argument that is fed to the forward function, and 'prev_output_tokens' is indeed not there. Interestingly, I got this to run in colab (python 3.7.15 and fairseq 0.9.0) but got interrupted in epoch 2 because I ran out of resources. I recreated the environment with conda on a remote server and installed everything exactly the same as before, to get access to the GPU, but then I got this error. I'm using the code from this github- repo for the fine tuning: https://github.com/MGheini/xattn-transfer-for-mt Is there any way to solve this? All input is much appreciated.

These are my parameters:

python3 $FAIRSEQ/fairseq_cli/train.py data-bin \ --langs $langs \ --source-lang $SRC --target-lang $TGT \ --log-format simple \ --log-interval 20 \ --seed 222 \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.2 \ --optimizer adam \ --adam-eps 1e-06 \ --adam-betas "(0.9, 0.98)" \ --weight-decay 0.0 \ --lr-scheduler polynomial_decay \ --task translation_from_pretrained_bart \ --eval-bleu --eval-bleu-detok moses \ --num-workers 8 \ --max-tokens 512 \ --validate-interval 1 \ --arch mbart_large \ --max-update 150000 \ --update-freq 8 \ --lr 3e-05 \ --min-lr -1 \ --restore-file checkpoint_last.pt \ --save-interval 1 \ --save-interval-updates 500 \ --keep-interval-updates 1 \ --no-epoch-checkpoints \ --warmup-updates 2500 \ --dropout 0.3 \ --attention-dropout 0.1 \ --relu-dropout 0.0 \ --layernorm-embedding \ --encoder-learned-pos \ --decoder-learned-pos \ --encoder-normalize-before \ --decoder-normalize-before \ --skip-invalid-size-inputs-valid-test \ --share-all-embeddings \ --finetune-from-mbart-at $MBART \ --only-finetune-cross-attn \ --patience 25

The whole error:

Traceback (most recent call last): File "/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq_cli/train.py", line 541, in cli_main() File "/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq_cli/train.py", line 537, in cli_main distributed_utils.call_main(args, main) File "/crex/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq/distributed_utils.py", line 255, in call_main main(args, kwargs) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq_cli/train.py", line 309, in main valid_losses, should_stop = train(args, trainer, task, epoch_itr) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/conda_env/lib/python3.7/contextlib.py", line 74, in inner return func(*args, *kwds) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq_cli/train.py", line 392, in train log_output = trainer.train_step(samples) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/conda_env/lib/python3.7/contextlib.py", line 74, in inner return func(args, kwds) File "/crex/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq/trainer.py", line 479, in train_step ignore_grad=is_dummy_batch, File "/crex/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq/tasks/fairseq_task.py", line 412, in train_step loss, sample_size, logging_output = criterion(model, sample) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/conda_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 551, in call result = self.forward(input, kwargs) File "/crex/proj/uppmax2022-2-18/cross_attn/cross_attn/xattn-transfer-for-mt/fairseq-modified/fairseq/criterions/label_smoothed_cross_entropy.py", line 56, in forward net_output = model(sample['net_input']) File "/proj/uppmax2022-2-18/cross_attn/cross_attn/conda_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 551, in call result = self.forward(input, **kwargs) TypeError: forward() missing 1 required positional argument: 'prev_output_tokens'

The **kwargs dictionary (which I believe forward is complaining about 'prev_input_tokens' is missing):

[{'id': tensor([67186, 27642, 9526, 57293, 27958, 19522, 48434, 26559]), 'nsentences': 8, 'ntokens': 312, 'net_input': {'src_tokens': tensor([[ 91, 1176, 598, 276, 3504, 1322, 902, 17, 419, 18, 1074, 2246, 4, 1074, 77, 332, 63, 83, 4787, 85, 769, 1393, 25, 662, 769, 2368, 102, 4, 83, 47, 1088, 1205, 85, 47, 6217, 650, 5, 2, 3], [1290, 15, 2490, 100, 297, 218, 119, 889, 4, 294, 12, 3290, 493, 3, 1789, 3414, 1069, 155, 2780, 2142, 173, 78, 251, 1553, 5473, 145, 9, 415, 9, 630, 12, 9767, 9, 4997, 137, 25, 5, 2, 3], [ 34, 802, 44, 460, 6847, 1558, 4, 9, 44, 201, 522, 1208, 4583, 2219, 323, 9, 941, 4, 9, 44, 201, 1975, 265, 1126, 507, 9, 888, 25, 9, 930, 16, 46, 1666, 905, 289, 145, 4, 2, 3], [2520, 88, 29, 5256, 2813, 210, 1611, 7949, 551, 727, 3041, 980, 4, 247, 1368, 1088, 8195, 4563, 447, 1414, 3322, 2209, 102, 926, 1072, 2571, 4, 4102, 15, 340, 3, 131, 45, 6850, 203, 1049, 5, 2, 3], [ 189, 369, 22, 428, 11, 385, 191, 47, 254, 9, 567, 5406, 4, 9, 44, 16, 9773, 1650, 1891, 19, 9, 414, 3634, 4433, 63, 964, 93, 782, 25, 9, 81, 1650, 1891, 16, 1317, 4332, 5, 2, 3], [3709, 3, 151, 554, 3745, 205, 98, 3, 9, 8756, 4, 5273, 107, 2782, 12, 5546, 1158, 5, 2532, 893, 4275, 107, 12, 5546, 191, 151, 8328, 253, 9, 111, 6920, 12, 468, 384, 9989, 889, 5, 2, 3], [ 466, 1592, 178, 868, 8147, 29, 2953, 8609, 4, 4504, 2249, 2131, 4504, 3844, 63, 557, 4, 106, 701, 16, 6478, 102, 44, 5728, 297, 65, 350, 3476, 71, 1592, 9, 16, 93, 25, 882, 3385, 5, 2, 3], [ 482, 562, 495, 5325, 22, 69, 4222, 12, 1931, 4571, 76, 11, 1023, 417, 11, 2927, 7987, 4, 253, 2280, 2644, 2247, 261, 4, 373, 232, 4, 722, 1529, 841, 286, 2642, 5920, 102, 15, 1753, 5, 2, 3]]), 'src_lengths': tensor([39, 39, 39, 39, 39, 39, 39, 39])}, 'target': None}

What have you tried?

What's your environment?

theamato commented 1 year ago

Okay, I've looked at it some more, and to my understanding, if 'prev_output_tokens' is none, input feeding is supposed to be evoked, which creates a shifted version of targets for feeding the previous output tokens into the next decoder step. But I guess this can't happen if target is also None? What is target supposed to contain? And if it is None, does is mean that I've made a mistake somewhere, for instance by passing in the wrong input files?

XueMoonLit commented 1 year ago

I think it's caused by data preprocess, you can check your preprocess step whether bugs occured at that time, you can also tracing the change of "sample" object in code "net_output = model(**sample['net_input'])", if key 'target' is none at first, bugs may have occured while preprocessing data.

theamato commented 1 year ago

Thank you for your response. It turned out I was missing one of the train.bin files from the preprocessing, and that was the cause of the error. It runs fine now.

man. 5. des. 2022 kl. 10:03 skrev XueMoonLit @.***>:

I think it's caused by data preprocess, you can check your preprocess step whether bugs occured at that time, you can also tracing the change of "sample" object in code "net_output = model(**sample['net_input'])", if key 'target' is none at first, bugs may have occured while preprocessing data.

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fairseq/issues/4871#issuecomment-1336985907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATEJFWVU6NCWFSFMNVWN5PDWLWVVHANCNFSM6AAAAAASENASKU . You are receiving this because you authored the thread.Message ID: @.***>

layerkugou commented 1 year ago

ok