AI4Bharat / indicTrans

indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
https://ai4bharat.iitm.ac.in/indic-trans
MIT License
119 stars 31 forks source link

Which model class(es) are being saved in the different model checkpoints? #17

Closed ekdnam closed 3 years ago

ekdnam commented 3 years ago

Hi.

I am trying to load the en-indic model into PyTorch.

After unzipping the folder, I am doing

checkpoint = torch.load("/content/en-indic/en-indic/model/checkpoint_best.pt")

Now, to load the model and optimizer for later use, I am following this.

To follow along with this tutorial, what is the TheModelClass and TheOptimizerClass?

gowtham1997 commented 3 years ago

We do not use custom model classes or custom optimizers for training with fairseq, so you could try using the fairseq's transformer model or optimizer classes directly.

Our translation model just uses a different config (different embedding dimension, attention heads compared to transformer_large architechture ).

For loading the model with the python interface for inference (note this cannot be used for training as python wrapper class is written for inference only), you can follow the tutorial here.

If you want to use the checkpoint and finetune the model further on custom data: you can follow the notebook tutorial here.

fairseq-train ../dataset/final_bin \
--max-source-positions=210 \
--max-target-positions=210 \
--max-update=1000 \
--save-interval=1 \
--arch=transformer_4x \
--criterion=label_smoothed_cross_entropy \
--source-lang=SRC \
--lr-scheduler=inverse_sqrt \
--target-lang=TGT \
--label-smoothing=0.1 \
--optimizer adam \
--adam-betas "(0.9, 0.98)" \
--clip-norm 1.0 \
--warmup-init-lr 1e-07 \
--warmup-updates 4000 \
--dropout 0.2 \
--tensorboard-logdir ../dataset/tensorboard-wandb \
--save-dir ../dataset/model \
--keep-last-epochs 5 \
--patience 5 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--user-dir model_configs \
--update-freq=2 \
--distributed-world-size 1 \
--max-tokens 256 \
--lr 3e-5 \
--restore-file ../en-indic/model/checkpoint_best.pt \
--reset-lr-scheduler \
--reset-meters \
--reset-dataloader \
--reset-optimizer

^ note that the final line --reset-optimizer will reset the optimizer states, if you want to reuse the optimizer for further training, do not pass this and also set the other reset flags (wrt lr scheduler, etc) accordingly.