cshizhe / VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
MIT License
99 stars 12 forks source link

About the pre-training problems #13

Closed imzhangsheng closed 1 year ago

imzhangsheng commented 1 year ago

As mentioned in the README.md, the first stage of training the pre-trained model requires fixed ViT, and the second stage is training ViT in an end-to-end manner. My question is whether to load the model trained in the first stage during the second stage of training?

cshizhe commented 1 year ago

The model trained in the first stage should be loaded in the second training stage.