CrystalSixone / VLN-GOAT

Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)
Apache License 2.0
29 stars 4 forks source link

About reproduce result #6

Open xybHFUT opened 3 days ago

xybHFUT commented 3 days ago

Hi there, I'm an early-stage researcher in the field of vision language navigation and I've been very impressed with GOAT.I've tried to reproduce your results on a single A100 GPU and kept the experimental setup consistent with the README using provided features and weights, but did not achieve expected outcomes(SR:74, OSR:83, SPL:63 on R2R valid unseen). Could you provide any suggestions or insights that might help improve these numbers. Big fan of GOAT and thank you for your outstanding contribution to the VLN community.

CrystalSixone commented 2 days ago

Thank you for your support of our work! Your experimental results seem unusual. Could you kindly provide your logs/train.txt, and the number of iterations it took to achieve your current results?

xybHFUT commented 2 days ago

Thanks for your response! I have trained 860K iters while the results on REVERIE is quite satisfactory. Thanks for your help!

w61 @.***> 于2024年10月14日周一 03:56写道:

Thank you for your support of our work! Your experimental results seem unusual. Could you kindly provide your logs/train.txt, and the number of iterations it took to achieve your current results?

— Reply to this email directly, view it on GitHub https://github.com/CrystalSixone/VLN-GOAT/issues/6#issuecomment-2409104660, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHZIDGSZUNLZEXZNSVX5WHDZ3LGALAVCNFSM6AAAAABP3DY4K2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBZGEYDINRWGA . You are receiving this because you authored the thread.Message ID: @.***>

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Listener training starts, start iteration: 0 Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Listener training starts, start iteration: 0 Namespace(accumulate_grad=True, act_visited_nodes=False, adaptive_pano_fusion=True, aemb=64, angle_feat_size=4, anno_dir='../datasets/R2R/annotations', aug='../datasets/R2R/annotations/prevalent_aug_train_enc.json', aug_img_ft_file_envedit='../datasets/EnvEdit/hamt_features/CLIP-ViT-B-16-views-st-samefilter.hdf5', aug_times=1, backdoor_dict_file='', batch_size=12, bert_ckpt_file='../datasets/R2R/pretrain/goat_r2r_pretrain/ckpts/model_step_best.pt', cat_file='../datasets/R2R/annotations/category_mapping.tsv', cfp_temperature=1.0, ckpt_dir='../datasets/R2R/navigator/goat_r2r/ckpts', connectivity_dir='../datasets/R2R/connectivity', dataset='r2r', detailed_output=False, do_add_method='door', do_back_img=True, do_back_img_type='type_1', do_back_txt=True, do_back_txt_type='type_2', do_front_his=True, do_front_img=True, do_front_txt=True, dropout=0.1, enc_full_graph=True, entropy_loss_weight=0.01, env_edit=False, epsilon=0.1, eval_first=False, expert_policy='spl', expl_max_ratio=0.6, expl_sample=False, feat_dropout=0.5, featdropout=0.3, feature_size=768, features='clip768', feedback='sample', fix_lang_embedding=False, fix_local_branch=False, fix_pano_embedding=False, for_debug=False, front_feat_file='../datasets/R2R/features/r2r_cfp_features.tsv', front_n_clusters=24, frontdoor_dict_file='', fusion='dynamic', gamma=0.9, graph_sprels=True, h_dim=512, ignoreid=-100, image_feat_size=768, img_ft_file='../datasets/R2R/features/CLIP-ViT-B-16-views.hdf5', img_type='hdf5', img_zdict_file='../datasets/R2R/features/image_z_dict_clip_50.tsv', img_zdict_size=50, instr_zdict_file='../datasets/R2R/features/r2r_z_instr_dict.tsv', instr_zdict_size=81, iters=150000, loadOptim=False, local_rank=-1, log_dir='../datasets/R2R/navigator/goat_r2r/logs', log_every=1000, lr=2e-05, lr_sch='polynomial', maxDecode=120, max_action_len=15, max_instr_len=200, ml_weight=0.2, mode='train', name='goat_r2r', node_rank=0, normalize_loss='total', num_l_layers=6, num_pano_layers=2, num_x_layers=3, obj_feat_size=768, optim='adamW', output_dir='../datasets/R2R/navigator/goat_r2r', pred_dir='../datasets/R2R/navigator/goat_r2r/preds', proj_hidden=1024, resume_file=None, resume_optimizer=False, root_dir='../datasets', rxr_front_feat_file='../datasets/R2R/features/rxr_cfp_features.tsv', rxr_instr_zdict_roberta_file='../datasets/R2R/features/rxr_z_instr_dict.tsv', save_optimizer=False, scan_data_dir='../datasets/Matterport3D/v1_unzip_scans', scanvp_cands_file='../datasets/R2R/annotations/scanvp_candview_relangles.json', seed=0, speaker='../datasets/R2R/speaker/transpeaker_r2r/state_dict/best_both_bleu.pt', speaker_angle_size=128, speaker_dropout=0.2, speaker_head_num=4, speaker_layer_num=3, speaker_train_vocab='../datasets/R2R/features/r2r_speaker_train_vocab.txt', sub_out='tanh', submit=False, test=False, tokenizer='roberta', train_alg='dagger', update_iter=3000, use_aug_env=False, use_drop=False, use_lr_sch=False, use_transpeaker=True, views=36, weight_decay=0.0, wemb=256, world_size=1, z_back_log_dir='../datasets/R2R/navigator/goat_r2r/logs/backdoor', z_front_log_dir='../datasets/R2R/navigator/goat_r2r/logs/frontdoor', z_instr_update=True)

Listener training starts, start iteration: 0

total_actions 1, max_length 1, entropy 344996.1667, IL_loss 12.4085, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 38m 16s (- 5702m 59s) (1000 0%) iter 1000, val_train_seen , action_steps: 6.35, steps: 7.15, lengths: 13.39, nav_error: 5.86, oracle_error: 2.98, sr: 49.33, oracle_sr: 64.00, spl: 43.42, nDTW: 55.87, SDTW: 40.30, CLS: 57.22, val_seen , action_steps: 6.64, steps: 7.81, lengths: 14.98, nav_error: 7.01, oracle_error: 3.68, sr: 39.57, oracle_sr: 55.83, spl: 33.65, nDTW: 48.53, SDTW: 31.10, CLS: 51.16, val_unseen , action_steps: 6.73, steps: 7.97, lengths: 14.89, nav_error: 7.05, oracle_error: 3.63, sr: 39.59, oracle_sr: 56.70, spl: 32.14, nDTW: 45.59, SDTW: 30.34, CLS: 47.55 BEST RESULT TILL NOW val_unseen | Iter 1000 iter 1000, val_train_seen , action_steps: 6.35, steps: 7.15, lengths: 13.39, nav_error: 5.86, oracle_error: 2.98, sr: 49.33, oracle_sr: 64.00, spl: 43.42, nDTW: 55.87, SDTW: 40.30, CLS: 57.22, val_seen , action_steps: 6.64, steps: 7.81, lengths: 14.98, nav_error: 7.01, oracle_error: 3.68, sr: 39.57, oracle_sr: 55.83, spl: 33.65, nDTW: 48.53, SDTW: 31.10, CLS: 51.16, val_unseen , action_steps: 6.73, steps: 7.97, lengths: 14.89, nav_error: 7.05, oracle_error: 3.63, sr: 39.59, oracle_sr: 56.70, spl: 32.14, nDTW: 45.59, SDTW: 30.34, CLS: 47.55

total_actions 1, max_length 1, entropy 323228.8032, IL_loss 10.6788, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 73m 48s (- 5461m 25s) (2000 1%) iter 2000, val_train_seen , action_steps: 6.09, steps: 6.83, lengths: 13.09, nav_error: 3.81, oracle_error: 2.14, sr: 56.67, oracle_sr: 73.33, spl: 51.78, nDTW: 65.93, SDTW: 47.69, CLS: 66.68, val_seen , action_steps: 6.50, steps: 7.95, lengths: 16.09, nav_error: 5.90, oracle_error: 3.25, sr: 47.01, oracle_sr: 62.10, spl: 39.34, nDTW: 53.30, SDTW: 36.85, CLS: 54.25, val_unseen , action_steps: 6.71, steps: 8.34, lengths: 16.04, nav_error: 5.63, oracle_error: 2.98, sr: 49.21, oracle_sr: 66.03, spl: 40.41, nDTW: 52.61, SDTW: 38.36, CLS: 53.46 BEST RESULT TILL NOW val_unseen | Iter 2000 iter 2000, val_train_seen , action_steps: 6.09, steps: 6.83, lengths: 13.09, nav_error: 3.81, oracle_error: 2.14, sr: 56.67, oracle_sr: 73.33, spl: 51.78, nDTW: 65.93, SDTW: 47.69, CLS: 66.68, val_seen , action_steps: 6.50, steps: 7.95, lengths: 16.09, nav_error: 5.90, oracle_error: 3.25, sr: 47.01, oracle_sr: 62.10, spl: 39.34, nDTW: 53.30, SDTW: 36.85, CLS: 54.25, val_unseen , action_steps: 6.71, steps: 8.34, lengths: 16.04, nav_error: 5.63, oracle_error: 2.98, sr: 49.21, oracle_sr: 66.03, spl: 40.41, nDTW: 52.61, SDTW: 38.36, CLS: 53.46

total_actions 1, max_length 1, entropy 308666.1617, IL_loss 9.7786, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 109m 24s (- 5361m 2s) (3000 2%) iter 3000, val_train_seen , action_steps: 6.44, steps: 7.34, lengths: 14.74, nav_error: 3.72, oracle_error: 1.94, sr: 64.67, oracle_sr: 76.67, spl: 55.69, nDTW: 63.54, SDTW: 52.62, CLS: 62.77, val_seen , action_steps: 6.73, steps: 7.84, lengths: 15.92, nav_error: 5.13, oracle_error: 2.71, sr: 54.06, oracle_sr: 68.76, spl: 45.01, nDTW: 56.54, SDTW: 42.69, CLS: 57.51, val_unseen , action_steps: 6.92, steps: 8.40, lengths: 16.15, nav_error: 5.10, oracle_error: 2.56, sr: 53.17, oracle_sr: 71.18, spl: 43.26, nDTW: 54.58, SDTW: 41.42, CLS: 55.47 BEST RESULT TILL NOW val_unseen | Iter 3000 iter 3000, val_train_seen , action_steps: 6.44, steps: 7.34, lengths: 14.74, nav_error: 3.72, oracle_error: 1.94, sr: 64.67, oracle_sr: 76.67, spl: 55.69, nDTW: 63.54, SDTW: 52.62, CLS: 62.77, val_seen , action_steps: 6.73, steps: 7.84, lengths: 15.92, nav_error: 5.13, oracle_error: 2.71, sr: 54.06, oracle_sr: 68.76, spl: 45.01, nDTW: 56.54, SDTW: 42.69, CLS: 57.51, val_unseen , action_steps: 6.92, steps: 8.40, lengths: 16.15, nav_error: 5.10, oracle_error: 2.56, sr: 53.17, oracle_sr: 71.18, spl: 43.26, nDTW: 54.58, SDTW: 41.42, CLS: 55.47

total_actions 1, max_length 1, entropy 296340.9037, IL_loss 9.0615, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 144m 23s (- 5270m 28s) (4000 2%) iter 4000, val_train_seen , action_steps: 6.04, steps: 6.42, lengths: 12.80, nav_error: 3.75, oracle_error: 2.12, sr: 61.33, oracle_sr: 75.33, spl: 55.64, nDTW: 65.65, SDTW: 51.82, CLS: 64.60, val_seen , action_steps: 6.12, steps: 6.82, lengths: 13.70, nav_error: 4.80, oracle_error: 2.69, sr: 56.81, oracle_sr: 69.93, spl: 48.45, nDTW: 61.15, SDTW: 46.35, CLS: 60.80, val_unseen , action_steps: 6.40, steps: 7.57, lengths: 14.73, nav_error: 4.58, oracle_error: 2.41, sr: 59.60, oracle_sr: 73.14, spl: 48.83, nDTW: 59.29, SDTW: 46.87, CLS: 58.71 BEST RESULT TILL NOW val_unseen | Iter 4000 iter 4000, val_train_seen , action_steps: 6.04, steps: 6.42, lengths: 12.80, nav_error: 3.75, oracle_error: 2.12, sr: 61.33, oracle_sr: 75.33, spl: 55.64, nDTW: 65.65, SDTW: 51.82, CLS: 64.60, val_seen , action_steps: 6.12, steps: 6.82, lengths: 13.70, nav_error: 4.80, oracle_error: 2.69, sr: 56.81, oracle_sr: 69.93, spl: 48.45, nDTW: 61.15, SDTW: 46.35, CLS: 60.80, val_unseen , action_steps: 6.40, steps: 7.57, lengths: 14.73, nav_error: 4.58, oracle_error: 2.41, sr: 59.60, oracle_sr: 73.14, spl: 48.83, nDTW: 59.29, SDTW: 46.87, CLS: 58.71

total_actions 1, max_length 1, entropy 286617.0868, IL_loss 8.5162, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 179m 39s (- 5210m 19s) (5000 3%) iter 5000, val_train_seen , action_steps: 6.15, steps: 7.25, lengths: 14.74, nav_error: 3.02, oracle_error: 1.80, sr: 70.00, oracle_sr: 78.00, spl: 61.29, nDTW: 67.57, SDTW: 56.77, CLS: 66.48, val_seen , action_steps: 6.14, steps: 7.23, lengths: 14.33, nav_error: 4.21, oracle_error: 2.52, sr: 60.72, oracle_sr: 70.03, spl: 51.50, nDTW: 63.62, SDTW: 49.40, CLS: 63.06, val_unseen , action_steps: 6.20, steps: 7.61, lengths: 14.37, nav_error: 4.12, oracle_error: 2.39, sr: 62.96, oracle_sr: 73.65, spl: 52.15, nDTW: 61.76, SDTW: 49.79, CLS: 60.71 BEST RESULT TILL NOW val_unseen | Iter 5000 iter 5000, val_train_seen , action_steps: 6.15, steps: 7.25, lengths: 14.74, nav_error: 3.02, oracle_error: 1.80, sr: 70.00, oracle_sr: 78.00, spl: 61.29, nDTW: 67.57, SDTW: 56.77, CLS: 66.48, val_seen , action_steps: 6.14, steps: 7.23, lengths: 14.33, nav_error: 4.21, oracle_error: 2.52, sr: 60.72, oracle_sr: 70.03, spl: 51.50, nDTW: 63.62, SDTW: 49.40, CLS: 63.06, val_unseen , action_steps: 6.20, steps: 7.61, lengths: 14.37, nav_error: 4.12, oracle_error: 2.39, sr: 62.96, oracle_sr: 73.65, spl: 52.15, nDTW: 61.76, SDTW: 49.79, CLS: 60.71

total_actions 1, max_length 1, entropy 277202.0289, IL_loss 8.1063, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 214m 36s (- 5150m 36s) (6000 4%) iter 6000, val_train_seen , action_steps: 5.88, steps: 6.51, lengths: 12.72, nav_error: 2.66, oracle_error: 1.70, sr: 75.33, oracle_sr: 82.00, spl: 66.21, nDTW: 70.78, SDTW: 61.63, CLS: 68.76, val_seen , action_steps: 6.03, steps: 6.76, lengths: 13.50, nav_error: 3.86, oracle_error: 2.27, sr: 63.57, oracle_sr: 73.95, spl: 56.27, nDTW: 66.52, SDTW: 53.48, CLS: 65.85, val_unseen , action_steps: 6.11, steps: 7.14, lengths: 13.53, nav_error: 4.07, oracle_error: 2.36, sr: 62.96, oracle_sr: 73.35, spl: 53.25, nDTW: 62.64, SDTW: 50.64, CLS: 62.12 BEST RESULT TILL NOW val_unseen | Iter 6000 iter 6000, val_train_seen , action_steps: 5.88, steps: 6.51, lengths: 12.72, nav_error: 2.66, oracle_error: 1.70, sr: 75.33, oracle_sr: 82.00, spl: 66.21, nDTW: 70.78, SDTW: 61.63, CLS: 68.76, val_seen , action_steps: 6.03, steps: 6.76, lengths: 13.50, nav_error: 3.86, oracle_error: 2.27, sr: 63.57, oracle_sr: 73.95, spl: 56.27, nDTW: 66.52, SDTW: 53.48, CLS: 65.85, val_unseen , action_steps: 6.11, steps: 7.14, lengths: 13.53, nav_error: 4.07, oracle_error: 2.36, sr: 62.96, oracle_sr: 73.35, spl: 53.25, nDTW: 62.64, SDTW: 50.64, CLS: 62.12

total_actions 1, max_length 1, entropy 271263.6479, IL_loss 7.9205, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 249m 30s (- 5097m 1s) (7000 4%) iter 7000, val_train_seen , action_steps: 5.63, steps: 5.98, lengths: 11.85, nav_error: 2.93, oracle_error: 2.00, sr: 72.67, oracle_sr: 78.00, spl: 65.78, nDTW: 71.35, SDTW: 61.02, CLS: 69.11, val_seen , action_steps: 5.98, steps: 6.95, lengths: 13.90, nav_error: 3.90, oracle_error: 2.29, sr: 63.47, oracle_sr: 72.87, spl: 56.21, nDTW: 66.48, SDTW: 53.36, CLS: 65.95, val_unseen , action_steps: 6.10, steps: 7.11, lengths: 13.67, nav_error: 3.86, oracle_error: 2.20, sr: 65.09, oracle_sr: 75.69, spl: 54.90, nDTW: 64.17, SDTW: 52.62, CLS: 63.04 BEST RESULT TILL NOW val_unseen | Iter 7000 iter 7000, val_train_seen , action_steps: 5.63, steps: 5.98, lengths: 11.85, nav_error: 2.93, oracle_error: 2.00, sr: 72.67, oracle_sr: 78.00, spl: 65.78, nDTW: 71.35, SDTW: 61.02, CLS: 69.11, val_seen , action_steps: 5.98, steps: 6.95, lengths: 13.90, nav_error: 3.90, oracle_error: 2.29, sr: 63.47, oracle_sr: 72.87, spl: 56.21, nDTW: 66.48, SDTW: 53.36, CLS: 65.95, val_unseen , action_steps: 6.10, steps: 7.11, lengths: 13.67, nav_error: 3.86, oracle_error: 2.20, sr: 65.09, oracle_sr: 75.69, spl: 54.90, nDTW: 64.17, SDTW: 52.62, CLS: 63.04

total_actions 1, max_length 1, entropy 265455.0633, IL_loss 7.5158, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 284m 16s (- 5045m 47s) (8000 5%) iter 8000, val_train_seen , action_steps: 6.17, steps: 6.95, lengths: 14.46, nav_error: 2.76, oracle_error: 1.57, sr: 72.00, oracle_sr: 83.33, spl: 63.08, nDTW: 68.96, SDTW: 57.56, CLS: 68.07, val_seen , action_steps: 6.37, steps: 7.62, lengths: 15.52, nav_error: 3.48, oracle_error: 1.88, sr: 67.48, oracle_sr: 79.63, spl: 58.09, nDTW: 66.45, SDTW: 55.23, CLS: 65.90, val_unseen , action_steps: 6.46, steps: 7.60, lengths: 14.97, nav_error: 3.90, oracle_error: 2.06, sr: 65.26, oracle_sr: 77.01, spl: 54.16, nDTW: 62.02, SDTW: 51.67, CLS: 61.54 BEST RESULT TILL NOW val_unseen | Iter 7000 iter 7000, val_train_seen , action_steps: 5.63, steps: 5.98, lengths: 11.85, nav_error: 2.93, oracle_error: 2.00, sr: 72.67, oracle_sr: 78.00, spl: 65.78, nDTW: 71.35, SDTW: 61.02, CLS: 69.11, val_seen , action_steps: 5.98, steps: 6.95, lengths: 13.90, nav_error: 3.90, oracle_error: 2.29, sr: 63.47, oracle_sr: 72.87, spl: 56.21, nDTW: 66.48, SDTW: 53.36, CLS: 65.95, val_unseen , action_steps: 6.10, steps: 7.11, lengths: 13.67, nav_error: 3.86, oracle_error: 2.20, sr: 65.09, oracle_sr: 75.69, spl: 54.90, nDTW: 64.17, SDTW: 52.62, CLS: 63.04

total_actions 1, max_length 1, entropy 256152.5757, IL_loss 7.2786, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 319m 9s (- 5000m 2s) (9000 6%) iter 9000, val_train_seen , action_steps: 6.05, steps: 6.70, lengths: 13.40, nav_error: 2.12, oracle_error: 1.26, sr: 78.00, oracle_sr: 86.00, spl: 69.51, nDTW: 73.64, SDTW: 64.77, CLS: 72.35, val_seen , action_steps: 6.32, steps: 7.57, lengths: 15.26, nav_error: 3.68, oracle_error: 1.98, sr: 67.29, oracle_sr: 77.96, spl: 58.04, nDTW: 66.58, SDTW: 55.60, CLS: 66.05, val_unseen , action_steps: 6.24, steps: 7.54, lengths: 14.37, nav_error: 3.61, oracle_error: 2.02, sr: 67.26, oracle_sr: 77.56, spl: 56.75, nDTW: 65.13, SDTW: 54.56, CLS: 63.93 BEST RESULT TILL NOW val_unseen | Iter 9000 iter 9000, val_train_seen , action_steps: 6.05, steps: 6.70, lengths: 13.40, nav_error: 2.12, oracle_error: 1.26, sr: 78.00, oracle_sr: 86.00, spl: 69.51, nDTW: 73.64, SDTW: 64.77, CLS: 72.35, val_seen , action_steps: 6.32, steps: 7.57, lengths: 15.26, nav_error: 3.68, oracle_error: 1.98, sr: 67.29, oracle_sr: 77.96, spl: 58.04, nDTW: 66.58, SDTW: 55.60, CLS: 66.05, val_unseen , action_steps: 6.24, steps: 7.54, lengths: 14.37, nav_error: 3.61, oracle_error: 2.02, sr: 67.26, oracle_sr: 77.56, spl: 56.75, nDTW: 65.13, SDTW: 54.56, CLS: 63.93

total_actions 1, max_length 1, entropy 252462.9590, IL_loss 7.1380, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 352m 55s (- 4940m 53s) (10000 6%) iter 10000, val_train_seen , action_steps: 5.71, steps: 6.09, lengths: 12.29, nav_error: 2.05, oracle_error: 1.29, sr: 77.33, oracle_sr: 86.00, spl: 68.23, nDTW: 74.82, SDTW: 64.36, CLS: 72.71, val_seen , action_steps: 5.91, steps: 6.64, lengths: 13.46, nav_error: 3.32, oracle_error: 1.95, sr: 69.74, oracle_sr: 78.55, spl: 62.32, nDTW: 70.29, SDTW: 59.17, CLS: 69.22, val_unseen , action_steps: 6.03, steps: 7.11, lengths: 13.81, nav_error: 3.53, oracle_error: 2.01, sr: 68.37, oracle_sr: 77.27, spl: 58.16, nDTW: 66.29, SDTW: 55.89, CLS: 64.91 BEST RESULT TILL NOW val_unseen | Iter 10000 iter 10000, val_train_seen , action_steps: 5.71, steps: 6.09, lengths: 12.29, nav_error: 2.05, oracle_error: 1.29, sr: 77.33, oracle_sr: 86.00, spl: 68.23, nDTW: 74.82, SDTW: 64.36, CLS: 72.71, val_seen , action_steps: 5.91, steps: 6.64, lengths: 13.46, nav_error: 3.32, oracle_error: 1.95, sr: 69.74, oracle_sr: 78.55, spl: 62.32, nDTW: 70.29, SDTW: 59.17, CLS: 69.22, val_unseen , action_steps: 6.03, steps: 7.11, lengths: 13.81, nav_error: 3.53, oracle_error: 2.01, sr: 68.37, oracle_sr: 77.27, spl: 58.16, nDTW: 66.29, SDTW: 55.89, CLS: 64.91

total_actions 1, max_length 1, entropy 244245.2710, IL_loss 6.8943, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 387m 0s (- 4890m 20s) (11000 7%) iter 11000, val_train_seen , action_steps: 6.97, steps: 8.75, lengths: 18.32, nav_error: 1.69, oracle_error: 0.69, sr: 78.67, oracle_sr: 92.00, spl: 62.78, nDTW: 67.82, SDTW: 59.16, CLS: 67.85, val_seen , action_steps: 7.15, steps: 9.40, lengths: 19.30, nav_error: 3.33, oracle_error: 1.48, sr: 69.15, oracle_sr: 82.66, spl: 56.54, nDTW: 63.38, SDTW: 54.37, CLS: 63.42, val_unseen , action_steps: 7.32, steps: 10.44, lengths: 20.79, nav_error: 3.66, oracle_error: 1.62, sr: 66.62, oracle_sr: 81.95, spl: 51.93, nDTW: 58.71, SDTW: 50.09, CLS: 58.87 BEST RESULT TILL NOW val_unseen | Iter 10000 iter 10000, val_train_seen , action_steps: 5.71, steps: 6.09, lengths: 12.29, nav_error: 2.05, oracle_error: 1.29, sr: 77.33, oracle_sr: 86.00, spl: 68.23, nDTW: 74.82, SDTW: 64.36, CLS: 72.71, val_seen , action_steps: 5.91, steps: 6.64, lengths: 13.46, nav_error: 3.32, oracle_error: 1.95, sr: 69.74, oracle_sr: 78.55, spl: 62.32, nDTW: 70.29, SDTW: 59.17, CLS: 69.22, val_unseen , action_steps: 6.03, steps: 7.11, lengths: 13.81, nav_error: 3.53, oracle_error: 2.01, sr: 68.37, oracle_sr: 77.27, spl: 58.16, nDTW: 66.29, SDTW: 55.89, CLS: 64.91

total_actions 1, max_length 1, entropy 239170.1487, IL_loss 6.6483, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 420m 13s (- 4832m 35s) (12000 8%) iter 12000, val_train_seen , action_steps: 5.64, steps: 5.98, lengths: 11.83, nav_error: 1.78, oracle_error: 1.08, sr: 82.00, oracle_sr: 86.67, spl: 75.56, nDTW: 78.89, SDTW: 72.02, CLS: 77.63, val_seen , action_steps: 5.91, steps: 6.57, lengths: 13.04, nav_error: 3.15, oracle_error: 1.76, sr: 70.62, oracle_sr: 79.43, spl: 63.83, nDTW: 71.85, SDTW: 61.06, CLS: 71.33, val_unseen , action_steps: 6.04, steps: 6.80, lengths: 12.94, nav_error: 3.44, oracle_error: 1.97, sr: 68.58, oracle_sr: 77.95, spl: 59.02, nDTW: 67.65, SDTW: 57.00, CLS: 66.34 BEST RESULT TILL NOW val_unseen | Iter 12000 iter 12000, val_train_seen , action_steps: 5.64, steps: 5.98, lengths: 11.83, nav_error: 1.78, oracle_error: 1.08, sr: 82.00, oracle_sr: 86.67, spl: 75.56, nDTW: 78.89, SDTW: 72.02, CLS: 77.63, val_seen , action_steps: 5.91, steps: 6.57, lengths: 13.04, nav_error: 3.15, oracle_error: 1.76, sr: 70.62, oracle_sr: 79.43, spl: 63.83, nDTW: 71.85, SDTW: 61.06, CLS: 71.33, val_unseen , action_steps: 6.04, steps: 6.80, lengths: 12.94, nav_error: 3.44, oracle_error: 1.97, sr: 68.58, oracle_sr: 77.95, spl: 59.02, nDTW: 67.65, SDTW: 57.00, CLS: 66.34

total_actions 1, max_length 1, entropy 234184.7659, IL_loss 6.4893, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 453m 17s (- 4777m 0s) (13000 8%) iter 13000, val_train_seen , action_steps: 5.96, steps: 6.57, lengths: 12.84, nav_error: 1.77, oracle_error: 1.04, sr: 84.67, oracle_sr: 88.67, spl: 74.67, nDTW: 78.05, SDTW: 70.96, CLS: 76.41, val_seen , action_steps: 6.45, steps: 7.90, lengths: 15.80, nav_error: 3.26, oracle_error: 1.73, sr: 68.95, oracle_sr: 79.53, spl: 59.54, nDTW: 67.77, SDTW: 57.20, CLS: 67.43, val_unseen , action_steps: 6.32, steps: 7.38, lengths: 14.13, nav_error: 3.61, oracle_error: 1.95, sr: 67.65, oracle_sr: 77.91, spl: 56.94, nDTW: 64.90, SDTW: 54.92, CLS: 64.25 BEST RESULT TILL NOW val_unseen | Iter 12000 iter 12000, val_train_seen , action_steps: 5.64, steps: 5.98, lengths: 11.83, nav_error: 1.78, oracle_error: 1.08, sr: 82.00, oracle_sr: 86.67, spl: 75.56, nDTW: 78.89, SDTW: 72.02, CLS: 77.63, val_seen , action_steps: 5.91, steps: 6.57, lengths: 13.04, nav_error: 3.15, oracle_error: 1.76, sr: 70.62, oracle_sr: 79.43, spl: 63.83, nDTW: 71.85, SDTW: 61.06, CLS: 71.33, val_unseen , action_steps: 6.04, steps: 6.80, lengths: 12.94, nav_error: 3.44, oracle_error: 1.97, sr: 68.58, oracle_sr: 77.95, spl: 59.02, nDTW: 67.65, SDTW: 57.00, CLS: 66.34

total_actions 1, max_length 1, entropy 229283.1830, IL_loss 6.3177, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 486m 5s (- 4722m 0s) (14000 9%) iter 14000, val_train_seen , action_steps: 5.65, steps: 5.87, lengths: 11.33, nav_error: 1.25, oracle_error: 0.80, sr: 85.33, oracle_sr: 91.33, spl: 79.37, nDTW: 83.31, SDTW: 76.32, CLS: 82.16, val_seen , action_steps: 5.77, steps: 6.50, lengths: 12.92, nav_error: 2.97, oracle_error: 1.80, sr: 71.60, oracle_sr: 79.33, spl: 64.82, nDTW: 72.98, SDTW: 62.18, CLS: 71.74, val_unseen , action_steps: 5.80, steps: 6.52, lengths: 12.31, nav_error: 3.53, oracle_error: 2.16, sr: 67.35, oracle_sr: 74.80, spl: 58.55, nDTW: 67.32, SDTW: 56.35, CLS: 66.02 BEST RESULT TILL NOW val_unseen | Iter 12000 iter 12000, val_train_seen , action_steps: 5.64, steps: 5.98, lengths: 11.83, nav_error: 1.78, oracle_error: 1.08, sr: 82.00, oracle_sr: 86.67, spl: 75.56, nDTW: 78.89, SDTW: 72.02, CLS: 77.63, val_seen , action_steps: 5.91, steps: 6.57, lengths: 13.04, nav_error: 3.15, oracle_error: 1.76, sr: 70.62, oracle_sr: 79.43, spl: 63.83, nDTW: 71.85, SDTW: 61.06, CLS: 71.33, val_unseen , action_steps: 6.04, steps: 6.80, lengths: 12.94, nav_error: 3.44, oracle_error: 1.97, sr: 68.58, oracle_sr: 77.95, spl: 59.02, nDTW: 67.65, SDTW: 57.00, CLS: 66.34

total_actions 1, max_length 1, entropy 221678.5946, IL_loss 6.0878, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 519m 20s (- 4674m 8s) (15000 10%) iter 15000, val_train_seen , action_steps: 5.69, steps: 6.26, lengths: 12.75, nav_error: 1.66, oracle_error: 0.84, sr: 87.33, oracle_sr: 92.00, spl: 81.35, nDTW: 82.83, SDTW: 78.59, CLS: 82.17, val_seen , action_steps: 6.06, steps: 7.42, lengths: 14.92, nav_error: 2.84, oracle_error: 1.61, sr: 72.67, oracle_sr: 81.19, spl: 63.63, nDTW: 71.15, SDTW: 60.96, CLS: 69.83, val_unseen , action_steps: 6.35, steps: 7.76, lengths: 14.90, nav_error: 3.39, oracle_error: 1.84, sr: 68.33, oracle_sr: 79.10, spl: 56.55, nDTW: 64.78, SDTW: 54.65, CLS: 63.87 BEST RESULT TILL NOW val_unseen | Iter 12000 iter 12000, val_train_seen , action_steps: 5.64, steps: 5.98, lengths: 11.83, nav_error: 1.78, oracle_error: 1.08, sr: 82.00, oracle_sr: 86.67, spl: 75.56, nDTW: 78.89, SDTW: 72.02, CLS: 77.63, val_seen , action_steps: 5.91, steps: 6.57, lengths: 13.04, nav_error: 3.15, oracle_error: 1.76, sr: 70.62, oracle_sr: 79.43, spl: 63.83, nDTW: 71.85, SDTW: 61.06, CLS: 71.33, val_unseen , action_steps: 6.04, steps: 6.80, lengths: 12.94, nav_error: 3.44, oracle_error: 1.97, sr: 68.58, oracle_sr: 77.95, spl: 59.02, nDTW: 67.65, SDTW: 57.00, CLS: 66.34

total_actions 1, max_length 1, entropy 219952.1053, IL_loss 6.0129, RL_loss 0.0000, policy_loss 0.0000, critic_loss 0.0000 552m 27s (- 4626m 48s) (16000 10%) iter 16000, val_train_seen , action_steps: 5.67, steps: 6.02, lengths: 11.86, nav_error: 1.30, oracle_error: 0.81, sr: 86.67, oracle_sr: 91.33, spl: 81.07, nDTW: 84.19, SDTW: 78.77, CLS: 83.17, val_seen , action_steps: 6.17, steps: 7.45, lengths: 14.74, nav_error: 3.01, oracle_error: 1.50, sr: 73.95, oracle_sr: 83.35, spl: 65.21, nDTW: 71.74, SDTW: 62.91, CLS: 71.00, val_unseen , action_steps: 6.31, steps: 7.79, lengths: 14.88, nav_error: 3.29, oracle_error: 1.74, sr: 70.11, oracle_sr: 80.25, spl: 58.70, nDTW: 66.55, SDTW: 57.06, CLS: 65.44 BEST RESULT TILL NOW val_unseen | Iter 16000 iter 16000, val_train_seen , action_steps: 5.67, steps: 6.02, lengths: 11.86, nav_error: 1.30, oracle_error: 0.81, sr: 86.67, oracle_sr: 91.33, spl: 81.07, nDTW: 84.19, SDTW: 78.77, CLS: 83.17, val_seen , action_steps: 6.17, steps: 7.45, lengths: 14.74, nav_error: 3.01, oracle_error: 1.50, sr: 73.95, oracle_sr: 83.35,

CrystalSixone commented 2 days ago

I believe the number of training iterations shown in your log was insufficient; the training should ideally last for at least 80K iterations. Based on your latest response, I’m glad to see that you’ve achieved satisfactory results. Wishing you the best of luck! :)

xybHFUT commented 1 day ago

Apologies, I previously uploaded the train_log directly through email, which resulted in it not being displayed completely. Below is a screenshot of the training log on R2R. Sorry for my mistake. 321bd9cde042ced58cc3e389143390a

CrystalSixone commented 23 hours ago

I’ve reviewed your log and noticed that your SR and SPL on the unseen set at iteration 0 are quite low. In my experiments, I was able to achieve approximately 65 SR and 52 SPL at iteration 0. I’m not sure what might be causing this issue, but I suggest checking whether the pre-trained weights were successfully loaded.

xybHFUT commented 5 hours ago

I used the pre-trained weights provided in the HuggingFace repo. That's quite a lot performance gap between at iteration 0. I will recheck my code. And thanks a lot for your help!