Artanic30 / HOICLIP

CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
51 stars 7 forks source link

Timeline for trained checkpoints and training accuracy problem #6

Closed ZhouGuangP closed 11 months ago

ZhouGuangP commented 1 year ago

Hi, Thank you for your nice work. I am really interested in your work and would like to know if there is a timeline for your trained checkpoints release. Thank you very much. AndI reproduced your code on an NVIDIA GeForce RTX 3090 GPU with the following settings,I get the best result is {"mAP": 0.3196247755016406, "mAP rare": 0.27823223399414826, "mAP non-rare": 0.3319887814064759, "mean max recall": 0.6587294594066022},I want to fully achieve your training performance, please give me some advice, thank you!Did I ignore some settings? Namespace(alpha=0.5, alternative=1, analysis=False, aux_loss=True, backbone='resnet50', batch_size=6, bbox_loss_coef=2.5, cache_img=False, calip_path='', clip_embed_dim=512, clip_max_norm=0.1, clip_model='ViT-B/32', coco_panoptic_path=None, coco_path=None, dataset_file='hico', dataset_root='GEN', dec_layers=3, del_unseen=False, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dino_config='', dist_url='env://', distributed=False, dropout=0.1, early_stop_mimic=False, enable_amp=False, enable_cp=False, enc_layers=6, eos_coef=0.1, epochs=90, eval=False, eval_each=4, eval_each_ap=False, eval_each_lr_drop=2, eval_location=False, fix_backbone_mode=[], fix_clip='True', fix_clip_label=False, frac=-1.0, frozen_weights=None, fs_num=-1, fs_pipeline='mix', fs_strategy=2, ft_clip_with_small_lr=False, giou_loss_coef=1, gradient_accumulation_steps=1, hidden_dim=256, hoi=False, hoi_loss_coef=2, hoi_path='data/hico_20160224_det', inter_dec_layers=3, json_file='results.json', lr=0.0001, lr_backbone=1e-05, lr_clip=1e-05, lr_drop=60, lr_drop_gamma=0.1, mask_loss_coef=1, masks=False, mimic_loss_coef=20, model_name='HOICLIP', nheads=8, nms_alpha=1, nms_beta=0.5, no_clip_cls_init=False, no_fix_clip_linear=False, no_training=False, num_obj_classes=80, num_patterns=0, num_queries=64, num_verb_classes=117, num_workers=8, obj_loss_coef=1, opt_level='O2', opt_sched='multiStep', output_dir='exps/hico/hoiclip', pe_temperatureH=20, pe_temperatureW=20, position_embedding='sine', pre_norm=False, pretrained='params/detr-r50-pre-2branch-hico.pth', random_refpoints_xy=False, rec_loss_coef=2, remove_difficult=False, resume='', seed=42, set_cost_bbox=2.5, set_cost_class=1, set_cost_giou=1, set_cost_hoi=1, set_cost_obj_class=1, set_cost_verb_class=1, start_epoch=0, subject_category_id=0, thres_nms=0.7, topk_hoi=10, transformer_activation='prelu', two_stage=False, use_ddp=1, use_nms_filter='True', validation_split=-1.0, verb_loss_coef=2, verb_loss_type='focal', verb_pth='./tmp/verb.pth', verb_weight=0.5, weight_decay=0.0001, with_clip_label='True', with_mimic=False, with_obj_clip_label='True', with_random_shuffle=2, with_rec_loss=False, world_size=1, zero_shot_type='default')

Artanic30 commented 1 year ago

Sorry for the late reply. The codes and checkpoints are updated in https://github.com/Artanic30/HOICLIP/commit/c93750cb4501823ea167c2dd7567bf9dda2c1541. Our training is conducted in two A40 GPUs with batch size 8. In our previous experiments, we notice the performance drop with smaller batch size and older GPUs (such as 4 2080 GPUs with same batch size). I hope these information address your issue.