Closed TarunTater closed 3 years ago
hello,
We use the following command for the en-indic training.
fairseq-train <exp_dir folder>/final_bin \
--max-source-positions=210 \
--max-target-positions=210 \
--max-update=1000000 \
--save-interval=1 \
--arch=transformer_4x \
--criterion=label_smoothed_cross_entropy \
--source-lang=SRC \
--lr-scheduler=inverse_sqrt \
--target-lang=TGT \
--label-smoothing=0.1 \
--optimizer adam \
--adam-betas "(0.9, 0.98)" \
--clip-norm 1.0 \
--warmup-init-lr 1e-07 \
--lr 0.0005 \
--warmup-updates 4000 \
--dropout 0.2 \
--tensorboard-logdir <exp_dir folder>/tensorboard-wandb \
--save-dir <exp_dir folder>/model \
--keep-last-epochs 5 \
--patience 5 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--user-dir model_configs \
--wandb-project <project name> \
--update-freq=1 \
--distributed-world-size 4 \
--max-tokens 16384
^ for results in our paper, we ensured the effective batch size (max_tokens distributed_world_size update_freq) = ~64K. We haven't tried training 4x model only for en-hi
@gowtham1997 - thanks for sharing the params. any specific reason for this ?
we ensured max_tokens * distributed_world_size * update_freq = ~64K.
for the memory constrains ?
Sorry, I missed replying to this yesterday.
We observed that larger effective batch sizes utilized the GPUs fully and also showed better results in our initial experiments and hence, we chose ~64K. Effective batch sizes > 64K would also help but with time constraints in mind, we choose to use ~64K for our paper.
ohk.. got it. thank you for the info.
We are trying to replicate the results from samantar indictrans paper. We are training the model for only en-hi translations. We are currently using these params following the paper :
fairseq-train ../en_hi_4x/final_bin --max-source-positions=210 --max-target-positions=210 --save-interval-updates=10000 --arch=transformer_4x --criterion=label_smoothed_cross_entropy --source-lang=SRC --lr-scheduler=inverse_sqrt --target-lang=TGT --label-smoothing=0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 1.0 --warmup-init-lr 1e-07 --lr 0.0005 --warmup-updates 4000 --dropout 0.2 --save-dir ../en_hi_4x/model --keep-last-epochs 5 --patience 5 --skip-invalid-size-inputs-valid-test --fp16 --user-dir model_configs --wandb-project 'train_1' --max-tokens 300"
Can you please share the params you have used for training the en-indic model or specifically if you have tried en-hi separately?