cshizhe / VLN-HAMT

Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
MIT License
99 stars 12 forks source link

The issues about the performance of two-Stage pretraining model #15

Open imzhangsheng opened 1 year ago

imzhangsheng commented 1 year ago

Dear Dr. Chen,

According to the experimental steps, the model trained in the first stage was put into the second stage for the second stage of pre-training. However, it was found that the performance of the model trained in the second stage was getting worse. For example, When I used the pre-trained model with 2500 iterations and evaluated in the val unseen set, the first validation SPL was 41.29. When I used the pre-trained model with 25000 iterations and evaluated in the val unseen set, the first validation SPL was 32.52. It seems that after end-to-end training, the performance was getting worse and worse. In terms of experimental details, due to the limited number of GPUs, I only used two GPUs for distributed training during the pretraining. And other model parameters were consistent with the pretrain_r2r_e2e.json file. And I stored the ViT features extracted from the trained models separately (2500 iterations and 25000 iterations) for evaluation. I would like to ask if I have made any mistakes or may have missed any settings that caused such a problem.

yhl2018 commented 1 year ago

@imzhangsheng Hi,Is it convenient to leave an email address? Want to discuss something with you?

imzhangsheng commented 1 year ago

Hello, my email is jonathan_@tongji.edu.cn