YueLiao / CDN

Code for "Mining the Benefits of Two-stage and One-stage HOI Detection"
Apache License 2.0
89 stars 15 forks source link

dynamic reweighting causes performance degradation in reproducing #4

Closed Charles-Xie closed 2 years ago

Charles-Xie commented 2 years ago

Hi, thanks for sharing the code! Great work!

I have a small question in reproducing your result. I run the CDN-S model (res50, 3+3). It gave a result of about 31.5 or 31.2 (I run 2 times) after the first training stage (train the whole model with re gular loss). But after the second training stage (decoupled training) is finished, the performance downgrades to 31.0 and 30.4 for these 2 runs separately. For full mAP, rare mAP and non-rare mAP, this trick seems to be not helpful.

So I wonder what could goes wrong during my reproduction or what can be the reason. I will paste the commands and log below. Thanks. Nice day :3

Charles-Xie commented 2 years ago

command (exactly the same as the one provided in readme, except the output_dir):


python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs/base \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/base/checkpoint_last.pth \
        --output_dir logs/base \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --obj_reweight \
        --verb_reweight \
        --use_nms_filter

echo "base"

corresponding result (log): 31.5 after 1st training stage and 31.0 after 2nd training stage: log.txt

Charles-Xie commented 2 years ago

for the 2nd run: command (exactly the same as the one provided in readme, except the output_dir):

python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs/base_4worker \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter \
        --num_workers 4

python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/base_4worker/checkpoint_last.pth \
        --output_dir logs/base_4worker \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --obj_reweight \
        --verb_reweight \
        --use_nms_filter \
        --num_workers 4

echo "base_4worker"

corresponding result (log): 31.2 after 1st training stage and 30.4 after 2nd training stage: log.txt

YueLiao commented 2 years ago

This module is implemented by @zhangaixi2008, and he will reply you later.

YueLiao commented 2 years ago

command (exactly the same as the one provided in readme, except the output_dir):

python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs/base \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python3 -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/base/checkpoint_last.pth \
        --output_dir logs/base \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --obj_reweight \
        --verb_reweight \
        --use_nms_filter

echo "base"

corresponding result (log): 31.5 after 1st training stage and 31.0 after 2nd training stage: log.txt

aha 31.5%, a new SOTA with CDN-S.

zhangaixi2008 commented 2 years ago

You can have a try with the following command. python -m torch.distributed.launch --master_port 10026 --nproc_per_node=4 --use_env main.py --pretrained logs/base/checkpoint_last.pth --output_dir logs/base --hoi --dataset_file hico --hoi_path data/hico_20160224_det --num_obj_classes 80 --num_verb_classes 117 --backbone resnet50 --set_cost_bbox 2.5 --set_cost_giou 1 --bbox_loss_coef 2.5 --giou_loss_coef 1 --num_queries 64 --dec_layers_stage1 3 --dec_layers_stage2 3 --epochs 10 --freeze_mode 1 --obj_reweight --verb_reweight --queue_size 9408 --p_obj 0.7 --p_verb 0.7 --lr 5e-6 --lr_backbone 5e-7 --use_nms_filter

boringwar commented 2 years ago

@zhangaixi2008 It does not work for me either. The reweighting retraining leads to performance drop. Details are as follows:

  1. Using given script training on HICO-DET:

CDN S:

CDN B:

I'm using the above script to re-run cdn-s finetuning.

zhangaixi2008 commented 2 years ago

@haak0 please upload your model here, let me have a look.

boringwar commented 2 years ago

@zhangaixi2008 Hi, some of my checkpoints are overwritten. I am re-running the experiments.

boringwar commented 2 years ago

Hi, I re-do the experiments, and here's the log. CDN small:

CDN base:

zhangaixi2008 commented 2 years ago

Hi, I made a mistake in the previous readme for running the fine-tune process. Please use the script I provide above under this issue. As we claimed in the paper, we use a small learning rate to fine-tune the first model. Thus, we set lr as 5e-6 and lr_backbone as 5e-7 for bs=8, or lr as 1e-5 and lr_backbone as 1e-6 for bs=16. Please try again and let us see the results. Sorry for our carelessness, we have already modified the readme.

boringwar commented 2 years ago

@zhangaixi2008 hi, I reproduce the finetune result following your script, and the result is reasonable. CDN-base:

BTW, what's the meaning of the "vis_tag" in hico_eval.py?

zhangaixi2008 commented 2 years ago

For CDN-base, you have already surpassed our reported (official matlab 31.78, python 31.86) in our paper. Good job^^ For 'vis_tag', you can see the evaluation script. In short, we filtered the already matched ground-truth hoi to calculate fp and tp during evaluation.

YueLiao commented 2 years ago

The issue about the "Re-weight module" seems to be solved. If any other issues, feel free to open a new issue.