cshizhe / VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
MIT License
99 stars 12 forks source link

Could you please share the running scripts for IL+RL training from scratch? #1

Open Jackie-Chou opened 2 years ago

Jackie-Chou commented 2 years ago

Hi, Shizhe. Thanks very much for the great HAMT work! I was recently using your code trying to run the VLN experiments myself. I noticed that you provided the running scripts for pretraining and fine-tuning, but not for training the model on R2R from scratch. And I guess the model configuration for training-from-scratch should be different from finetuning-after-pretraining, e.g., the --fix_lang_embedding and --fix_hist_embedding should not be set as the two embeddings are totally random right? So I hope you could share the scripts for training HAMT from scratch.

cshizhe commented 2 years ago

Hi, you could simply modify the configurations to train HAMT from scratch, e.g. no fix embeddings and no init from existing checkpoints.

Jackie-Chou commented 2 years ago

Hi, I understand what your mean. Actually, I would like to reproduce the training-from-scratch results in your paper using HAMT. So I want to know the detailed configurations you used.

Jackie-Chou commented 2 years ago

Specifically, should I remove both the --fix_lang_embedding and --fix_hist_embedding flags? Should I use the pretrained Bert weights to initialize the language part in the transformer like in pretraining or just leave it as random? Are there any other flags set differently to finetuning?