cshizhe / VLN-DUET

Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
114 stars 7 forks source link

Question about the training scripts #1

Closed MarSaKi closed 2 years ago

MarSaKi commented 2 years ago

Hello, Thanks for your great work!May I ask some questions? I found in the reverie finetune scripts https://github.com/cshizhe/VLN-DUET/blob/main/map_nav_src/scripts/run_reverie.sh. The bert_ckpt_file is set as ‘’, what does that means? Without this ckpt_file, how to init the vlnbert models? https://github.com/cshizhe/VLN-DUET/blob/main/map_nav_src/models/vlnbert_init.py `def get_vlnbert_models(args, config=None):

from transformers import PretrainedConfig
from models.vilmodel import GlocalTextPathNavCMT

model_name_or_path = args.bert_ckpt_file
new_ckpt_weights = {}
if model_name_or_path is not None:
    ckpt_weights = torch.load(model_name_or_path)
    for k, v in ckpt_weights.items():
        if k.startswith('module'):
            k = k[7:]    
        if '_head' in k or 'sap_fuse' in k:
            new_ckpt_weights['bert.' + k] = v
        else:
            new_ckpt_weights[k] = v`

Should bert_ckpt_file point to the ckpt after first-stage reverie pretraining or the ckpt of original LXMert? Besides, could you please provide ckpt from the first stage-pretrianing?

Thank you for your kind attention to this matter!

cshizhe commented 2 years ago

Hi, we use two-stage training. You need to first run pretrain script and then run the fine-tune script. The bert_ckpt_file is the pretrained model in the pretraining stage.

bxrjmfh commented 1 year ago

so, it should be 'the best eval seen' in here that we download, as rementioned in README requirements 3 ?

cshizhe commented 1 year ago

so, it should be 'the best eval seen' in here that we download, as rementioned in README requirements 3 ?

No, the provided checkpoint is our best model after finetuning.