KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.03k stars 228 forks source link

cant use pretrained Faster R-CNN #80

Closed nullkatar closed 3 years ago

nullkatar commented 3 years ago

❓ Questions and Help

Hello people, I got very strange issue. I pretrained Faster R-CNN with attributes on Visual genome using the following command: CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=1 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.STEPS "(30000, 45000)" SOLVER.VAL_PERIOD 20000 SOLVER.CHECKPOINT_PERIOD 20000 MODEL.RELATION_ON False OUTPUT_DIR /home/lkochiev/checkpoints/pretrained_faster_rcnn SOLVER.PRE_VAL False

So after it I obtained /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth, with following performance: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.111 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.251 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.085 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.047 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.095 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.132 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.206 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.332 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.343 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.234 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.332 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.340

After it I tried to train scene graph generator using this pretrained object detector and I ran the following command: CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=1 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR /home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/ MODEL.PRETRAINED_DETECTOR_CKPT /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR /home/lkochiev/checkpoints/motif-precls-exmp and strangely, I got a lot of NO-MATCHING of current module and REMATCHING! , but this is not the main problem. The main problem is evaluation. At first evaluation, before training I got very low performance:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.007 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007

Which is way different from the numbers which were outputted while training the detector. So after it I decided to evaluate my detector by running CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=1 tools/detector_pretest_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" MODEL.PRETRAINED_DETECTOR_CKPT /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth. But somehow this script outputted me similar performance:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.007 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.006 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007

like the model was never trained. I can't understand what I am doing wrong here. Thanks in advance for your guidance!

Einstone-rose commented 3 years ago

Hi, i also did the same task. Perhaps, your batch_size setting is two small (batch_size: 2) (when you set the batch_size smaller, you should set the learning rate smaller) when you train your own detector.

xcppy commented 3 years ago

Maybe your loaded checkpont path is wrong. Two codes for loading checkpoint:

    output_dir = cfg.OUTPUT_DIR
    checkpointer = DetectronCheckpointer(cfg, model, save_dir=output_dir)
    _ = checkpointer.load(cfg.MODEL.WEIGHT)

The checkpoint is loaded from output_dir.

    if checkpointer.has_checkpoint():
        extra_checkpoint_data = checkpointer.load(cfg.MODEL.PRETRAINED_DETECTOR_CKPT, 
                                       update_schedule=cfg.SOLVER.UPDATE_SCHEDULE_DURING_LOAD)
        arguments.update(extra_checkpoint_data)
    else:
        # load_mapping is only used when we init current model from detection model.
        checkpointer.load(cfg.MODEL.PRETRAINED_DETECTOR_CKPT, with_optim=False, load_mapping=load_mapping)

The checkpoint is loaded from output_dir if the last checkpoint in output_dir exists, else from cfg.MODEL.PRETRAINED_DETECTOR_CKPT.

nullkatar commented 3 years ago

Thanks for your comments @Einstone-rose, @xcppy . Talking about @Einstone-rose comment I should mention that I already scaled them and in my last piece of code I perform only do testing, so it cant be applied there. And talking about @xcppy when I went totally desperate I manually wrote into code you provided link to the file (I checked afterwards that proper file was uploaded) and still it gave me same zeroes everywhere.

nullkatar commented 3 years ago

Just in case if anybody would be interested, some days ago I discovered that this if clause was causing issues, so I just commented it out and everything started to work fine.