JacobYuan7 / RLIPv2

[ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Apache License 2.0
110 stars 3 forks source link

How to inference HICO-DET fully fine-tuned model #11

Closed safsfsvvea closed 7 months ago

safsfsvvea commented 7 months ago

Thanks for your help last time. I wonder how to inference HICO-DET fully fine-tuned checkpoint and reproduce the result in paper, since I haven't found the shell for it yet, like test_vcoco_official.sh?

JacobYuan7 commented 7 months ago

@safsfsvvea Thanks for your interest. This is pretty simple. Try using the fine-tuning script like this one. You can slightly modify the code to make it test at epoch 0, i.e., testing after loading the pre-trained model.

JacobYuan7 commented 7 months ago

@safsfsvvea Btw, remember to specify the "--pretrained" parameter to ensure it loads the right parameters.

safsfsvvea commented 7 months ago

I run the command python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py \ --pretrained /RLIPv2/checkpoints/hico_fine_tune/RLIP_PDA_v2_HICO_SwinL_VGCOCOO365_RQL_LSE_RPL_20e_L1_20e_3_checkpoint0019.pth \ --output_dir /RLIPv2/result/hico_fine_tune/RLIP_PDA_v2_HICO_SwinL_VGCOCOO365_RQL_LSE_RPL_20e_L1_20e_3 \ --dataset_file hico \ --hoi_path /RLIPv2/data/hico_20160224_det \ --hoi \ --load_backbone supervised \ --backbone swin_large \ --set_cost_bbox 2.5 \ --set_cost_giou 1 \ --bbox_loss_coef 2.5 \ --giou_loss_coef 1 \ --lr_drop 15 \ --epochs 0 \ --lr 1e-4 \ --lr_backbone 1e-5 \ --text_encoder_lr 1e-5 \ --schedule step \ --num_workers 4 \ --batch_size 1 \ --obj_loss_type cross_entropy \ --verb_loss_type focal \ --enc_layers 6 \ --dec_layers 3 \ --num_queries 128 \ --use_nms_filter \ --save_ckp \ --RLIP_ParSeDA_v2 \ --with_box_refine \ --num_feature_levels 4 \ --num_patterns 0 \ --pe_temperatureH 20 \ --pe_temperatureW 20 \ --dim_feedforward 2048 \ --dropout 0.0 \ --drop_path_rate 0.5 \ --use_no_obj_token \ --sampling_stategy freq \ --fusion_type GLIP_attn \ --gating_mechanism VXAc \ --verb_query_tgt_type vanilla_MBF \ --fusion_interval 2 \ --fusion_last_vis \ --lang_aux_loss \ --giou_verb_label \ --subject_class \

with epoch=0, the command line output these informations: load pretrained model... RLIP_ParSeDABDeformableTransformer_v2... Use checkpoint to save memory during RLIPv2_VLFuse. RLIPv2 EARLY FUSION ON, USING GLIP_attn We are using VXAc as a gating mechanism. We are using self.verb_query_tgt_type: vanilla_MBF. HungarianMatcherHOI matches with the subject class? True aux_loss True weight_dict {'loss_obj_ce': 1, 'loss_verb_ce': 1, 'loss_sub_bbox': 2.5, 'loss_obj_bbox': 2.5, 'loss_sub_giou': 1.0, 'loss_obj_giou': 1.0, 'loss_entropy_bound': 0.01, 'loss_kl_divergence': 0.01, 'loss_verb_gt_recon': 1, 'loss_ranking_verbs': 1, 'loss_verb_hm': 1, 'loss_semantic_similar': 1, 'loss_verb_threshold': 1, 'loss_sub_matching': 1, 'loss_obj_matching': 1, 'loss_verb_matching': 1, 'loss_masked_recon': 1, 'loss_masked_ce': 1, 'loss_obj_ce_recon': 1, 'loss_sub_bbox_recon': 2.5, 'loss_obj_bbox_recon': 2.5, 'loss_sub_giou_recon': 1.0, 'loss_obj_giou_recon': 1.0, 'loss_obj_ce_0': 1, 'loss_verb_ce_0': 1, 'loss_sub_bbox_0': 2.5, 'loss_obj_bbox_0': 2.5, 'loss_sub_giou_0': 1.0, 'loss_obj_giou_0': 1.0, 'loss_entropy_bound_0': 0.01, 'loss_kl_divergence_0': 0.01, 'loss_verb_gt_recon_0': 1, 'loss_ranking_verbs_0': 1, 'loss_verb_hm_0': 1, 'loss_semantic_similar_0': 1, 'loss_verb_threshold_0': 1, 'loss_sub_matching_0': 1, 'loss_obj_matching_0': 1, 'loss_verb_matching_0': 1, 'loss_masked_recon_0': 1, 'loss_masked_ce_0': 1, 'loss_obj_ce_recon_0': 1, 'loss_sub_bbox_recon_0': 2.5, 'loss_obj_bbox_recon_0': 2.5, 'loss_sub_giou_recon_0': 1.0, 'loss_obj_giou_recon_0': 1.0, 'loss_obj_ce_1': 1, 'loss_verb_ce_1': 1, 'loss_sub_bbox_1': 2.5, 'loss_obj_bbox_1': 2.5, 'loss_sub_giou_1': 1.0, 'loss_obj_giou_1': 1.0, 'loss_entropy_bound_1': 0.01, 'loss_kl_divergence_1': 0.01, 'loss_verb_gt_recon_1': 1, 'loss_ranking_verbs_1': 1, 'loss_verb_hm_1': 1, 'loss_semantic_similar_1': 1, 'loss_verb_threshold_1': 1, 'loss_sub_matching_1': 1, 'loss_obj_matching_1': 1, 'loss_verb_matching_1': 1, 'loss_masked_recon_1': 1, 'loss_masked_ce_1': 1, 'loss_obj_ce_recon_1': 1, 'loss_sub_bbox_recon_1': 2.5, 'loss_obj_bbox_recon_1': 2.5, 'loss_sub_giou_recon_1': 1.0, 'loss_obj_giou_recon_1': 1.0} Loss dict:['obj_labels', 'verb_labels', 'sub_obj_boxes', 'obj_cardinality'] verb_loss_type: focal ; obj_loss_type: cross_entropy Use naive_obj_smooth? 0 Use naive_verb_smooth? 0 Use pseudo_verb? False Use verb_curing? False Use triplet_filtering? False Post-processing with sigmoid? True Post-processing with temperature? False Zero-shot eval on hoi dataset? False Post-processing with verb_curing? False number of params: 383613296 The parameters are divided into three groups (with a text_encoder). Training anno file: /RLIPv2/data/hico_20160224_det/annotations/trainval_hico.json /anaconda3/envs/rlip/lib/python3.6/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) Training anno file: /RLIPv2/data/hico_20160224_det/annotations/trainval_hico.json Setting rare hois for None zero-shot setting. Loading /RLIPv2/checkpoints/hico_fine_tune/RLIP_PDA_v2_HICO_SwinL_VGCOCOO365_RQL_LSE_RPL_20e_L1_20e_3_checkpoint0019.pth ... Loading Info:

missing keys:0, #unexpected keys:0

Do not freeze any parameters. Start training Training time 0:00:00

I set --output_dir /RLIPv2/result/hico_fine_tune/RLIP_PDA_v2_HICO_SwinL_VGCOCOO365_RQL_LSE_RPL_20e_L1_20e_3, but I didn't see anything in the output_dir, so where can I find my inference output and get the evaluation result in the paper like python datasets/vsrl_eval.py?

JacobYuan7 commented 7 months ago

@safsfsvvea Perhaps you should add "--eval" into your command to make sure it runs the function "evaluate_hoi_with_text".

safsfsvvea commented 7 months ago

I have solved the problem, thank you very much for your help!