Which model class(es) are being saved in the different model checkpoints?

We do not use custom model classes or custom optimizers for training with fairseq, so you could try using the fairseq's transformer model or optimizer classes directly.

Our translation model just uses a different config (different embedding dimension, attention heads compared to transformer_large architechture ).

For loading the model with the python interface for inference (note this cannot be used for training as python wrapper class is written for inference only), you can follow the tutorial here.

If you want to use the checkpoint and finetune the model further on custom data: you can follow the notebook tutorial here.

fairseq-train ../dataset/final_bin \
--max-source-positions=210 \
--max-target-positions=210 \
--max-update=1000 \
--save-interval=1 \
--arch=transformer_4x \
--criterion=label_smoothed_cross_entropy \
--source-lang=SRC \
--lr-scheduler=inverse_sqrt \
--target-lang=TGT \
--label-smoothing=0.1 \
--optimizer adam \
--adam-betas "(0.9, 0.98)" \
--clip-norm 1.0 \
--warmup-init-lr 1e-07 \
--warmup-updates 4000 \
--dropout 0.2 \
--tensorboard-logdir ../dataset/tensorboard-wandb \
--save-dir ../dataset/model \
--keep-last-epochs 5 \
--patience 5 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--user-dir model_configs \
--update-freq=2 \
--distributed-world-size 1 \
--max-tokens 256 \
--lr 3e-5 \
--restore-file ../en-indic/model/checkpoint_best.pt \
--reset-lr-scheduler \
--reset-meters \
--reset-dataloader \
--reset-optimizer

^ note that the final line --reset-optimizer will reset the optimizer states, if you want to reuse the optimizer for further training, do not pass this and also set the other reset flags (wrt lr scheduler, etc) accordingly.

AI4Bharat / indicTrans

Which model class(es) are being saved in the different model checkpoints? #17