Open alphadl opened 4 years ago
CC @NonvolatileMemory
Got the same bug when setting the left-target-padding True.
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
@alphadl @NonvolatileMemory Have you managed to solve this issue?
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
❓ Questions and Help
Before asking:
What is your question?
When I train the vanilla Transformer_base and _big models with setting the
left-pad-target
as True, the fairseq will report an error:FloatingPointError: Minimum loss scale reached (0.0001). Your loss is probably exploding. Try lowering the learning rate, using gradient clipping or increasing the batch size.
Code
The training script of base model I used is:
#### What have you tried? I also met this issue in big model training and large batch (458k tokens) training. #### What's your environment? - fairseq Version (e.g., 1.0 or master): 0.9 - PyTorch Version (e.g., 1.0) 1.4 - OS (e.g., Linux): Linux - How you installed fairseq (`pip`, source): source - Build command you used (if compiling from source): pip install --editable $fairseq_path - Python version: 3.7python train.py databin/ende/wmt14/ -a transformer --share-all-embeddings --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --lr 1e-3 --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.98)' --task translation --max-tokens 8192 --update-freq 2 --dropout 0.3 --encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 --fp16 --ddp-backend=no_c10d --max-source-positions 10000 --max-target-positions 10000 --max-update 100000 --seed 1 --save-dir checkpoint/ende/wmt14/ --left-pad-target True