Closed dongxinfeng1 closed 5 months ago
Hi, the current code does not support direct testing with pre-trained models, which must be fine-tuned. If you plan to test the pre-trained model, you may need to change the code in the checkpoint loading section.
Or you can try to use the following command, which may work:
CUDA_VISIBLE_DEVICES='0' python3 -m torch.distributed.launch --master_port 29501 --nproc_per_node=${ngpus} main_nav_obj.py $flag \ --tokenizer bert \ --bert_ckpt_file ../datasets/REVERIE/exprs_map/pretrain/cmt-vitbase-mlm.mrc.sap.og-init.lxmert-aug.speaker-new/ckpts/model_step_10000.pt \ --eval_first
OK, Good luck!
Dear author, I am sorry to bother you again. When I try to Fine-tuning & Evaluation for REVERIE, it show the errors: " File "main_nav_obj.py", line 292, in
main()
File "main_nav_obj.py", line 288, in main
valid(args, train_env, val_envs, rank=rank)
File "main_nav_obj.py", line 232, in valid
agent.load(args.resume_file), args.resume_file))
File "GridMM/map_nav_src/reverie/agent_base.py", line 260, in load
recover_state(*param)
File "GridMM/map_nav_src/reverie/agent_base.py", line 236, in recover_state
load_keys = set(states[name]['state_dict'].keys())
KeyError: 'vln_bert'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 41011) of binary: /home/z/anaconda3/envs/gridmm/bin/python3"