Open 14H034160212 opened 3 years ago
Did you try removing --max-tokens param?
It works. Thanks a lot! @hungviet0304
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Hi,
Batch size cannot be above 22 when I do use fairseq to do fine-tuning. I set the batch size as 64, but the actual batch size is below 22.
Here is the script.
TOTAL_NUM_UPDATES=7812 # 10 epochs through IMDB for bsz 32 WARMUP_UPDATES=469 # 6 percent of the number of updates LR=1e-05 # Peak LR for polynomial LR scheduler. HEAD_NAME=PARARULE_head # Custom name for the classification head. NUM_CLASSES=2 # Number of classes for the classification task. MAX_SENTENCES=64 # Batch size. ROBERTA_PATH=../RoBERTa/model.pt # ../fairseq_checkpoints/checkpoint_best.pt
time CUDA_VISIBLE_DEVICES=6 python ../train.py PARARULE-bin/ \ --restore-file $ROBERTA_PATH \ --max-positions 512 \ --max-sentences $MAX_SENTENCES \ --max-tokens 4400 \ --task sentence_prediction \ --reset-optimizer --reset-dataloader --reset-meters \ --required-batch-size-multiple 1 \ --init-token 0 --separator-token 2 \ --arch roberta_large \ --criterion sentence_prediction \ --classification-head-name $HEAD_NAME \ --num-classes $NUM_CLASSES \ --dropout 0.1 --attention-dropout 0.1 \ --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \ --clip-norm 0.0 \ --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \ --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \ --max-epoch 10 \ --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \ --truncate-sequence \ --find-unused-parameters \ --tensorboard-logdir tensorboard_logs/gpu_1_batch_64 \
Kind regards, Qiming