facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

"Only right padding is supported" Error triggered while training MMA-Hard model #3889

Closed ereday closed 3 years ago

ereday commented 3 years ago

🐛 Bug

I was trying to train MMA-Hard model (a Simultaneous Translation model) on the WMT15 de-en data. After training started and 100-120 iterations done, I got "Only right padding is supported" error. You can find complete error message below.

To Reproduce

I've installed fairseq library today (September 17, 2021) (commit id: f6abcc2a6732) using following commands:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

I obtained the data using the bash script provided in fairseq repository. Preprocessing/Binarization was also done using the commands provided in the readme:

# Preprocess/binarize the data
TEXT=./wmt15.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir data-bin/wmt15.tokenized.de-en \
    --workers 20

Training command as follows:

fairseq-train \
data-bin/wmt15.tokenized.de-en \
 --simul-type hard_aligned --mass-preservation \
--criterion latency_augmented_label_smoothed_cross_entropy \
--latency-var-weight 0.1 --max-update 50000 \
--arch transformer_monotonic_iwslt_de_en \
--optimizer adam --adam-betas '(0.9, 0.98)' --lr-scheduler 'inverse_sqrt' \
--warmup-init-lr 1e-7  --warmup-updates 4000 \
--lr 5e-4 --stop-min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001 \
--dropout 0.3 --label-smoothing 0.1 --max-tokens 3584

As mentioned above, after a few iterations, I got the following error message:

Traceback (most recent call last):                                                                                             
  File "/home/anaconda3/envs/py37/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/home/fairseq/fairseq_cli/train.py", line 507, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/home/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/home/fairseq/fairseq_cli/train.py", line 180, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/home/anaconda3/envs/py37/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/fairseq/fairseq_cli/train.py", line 291, in train
    log_output = trainer.train_step(samples)
  File "/home/anaconda3/envs/py37/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/fairseq/fairseq/trainer.py", line 761, in train_step
    **extra_kwargs,
  File "/home/fairseq/fairseq/tasks/fairseq_task.py", line 492, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fairseq/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py", line 93, in forward
    net_output = model(**sample["net_input"])
  File "/home/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fairseq/fairseq/models/transformer/transformer_base.py", line 157, in forward
    return_all_hiddens=return_all_hiddens,
  File "/home/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fairseq/fairseq/models/transformer/transformer_decoder.py", line 222, in forward
    alignment_heads=alignment_heads,
  File "/home/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py", line 222, in extract_features
    else None,
  File "/home/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fairseq/examples/simultaneous_translation/modules/monotonic_transformer_layer.py", line 152, in forward
    need_head_weights=need_head_weights,
  File "/home/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py", line 351, in forward
    "Only right padding is supported."
AssertionError: Only right padding is supported.

Here is some information about my environment:

ereday commented 3 years ago

@xutaima Following old issues on MMA code, I understand that you're sort of a contact person for issues on MMA. I think that this error is triggered by a bug in the implementation as I get this error after a couple of iterations pass. Can you have a look at it?

xutaima commented 3 years ago

Hi @ereday, thanks for reaching out. Sorry that we haven't updated the document yet. As to this error please add --left-pad-source option for training.