Closed nullkatar closed 3 years ago
Hi, i also did the same task. Perhaps, your batch_size setting is two small (batch_size: 2) (when you set the batch_size smaller, you should set the learning rate smaller) when you train your own detector.
Maybe your loaded checkpont path is wrong. Two codes for loading checkpoint:
output_dir = cfg.OUTPUT_DIR
checkpointer = DetectronCheckpointer(cfg, model, save_dir=output_dir)
_ = checkpointer.load(cfg.MODEL.WEIGHT)
The checkpoint is loaded from output_dir
.
if checkpointer.has_checkpoint():
extra_checkpoint_data = checkpointer.load(cfg.MODEL.PRETRAINED_DETECTOR_CKPT,
update_schedule=cfg.SOLVER.UPDATE_SCHEDULE_DURING_LOAD)
arguments.update(extra_checkpoint_data)
else:
# load_mapping is only used when we init current model from detection model.
checkpointer.load(cfg.MODEL.PRETRAINED_DETECTOR_CKPT, with_optim=False, load_mapping=load_mapping)
The checkpoint is loaded from output_dir
if the last checkpoint in output_dir
exists, else from cfg.MODEL.PRETRAINED_DETECTOR_CKPT
.
Thanks for your comments @Einstone-rose, @xcppy . Talking about @Einstone-rose comment I should mention that I already scaled them and in my last piece of code I perform only do testing, so it cant be applied there. And talking about @xcppy when I went totally desperate I manually wrote into code you provided link to the file (I checked afterwards that proper file was uploaded) and still it gave me same zeroes everywhere.
❓ Questions and Help
Hello people, I got very strange issue. I pretrained Faster R-CNN with attributes on Visual genome using the following command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=1 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.STEPS "(30000, 45000)" SOLVER.VAL_PERIOD 20000 SOLVER.CHECKPOINT_PERIOD 20000 MODEL.RELATION_ON False OUTPUT_DIR /home/lkochiev/checkpoints/pretrained_faster_rcnn SOLVER.PRE_VAL False
So after it I obtained /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth, with following performance:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.111
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.251
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.085
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.047
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.095
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.132
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.206
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.332
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.234
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.332
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.340
After it I tried to train scene graph generator using this pretrained object detector and I ran the following command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=1 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False SOLVER.IMS_PER_BATCH 2 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR /home/lkochiev/Documents/SFU/NSM/SGB/Scene-Graph-Benchmark.pytorch/ MODEL.PRETRAINED_DETECTOR_CKPT /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR /home/lkochiev/checkpoints/motif-precls-exmp
and strangely, I got a lot ofNO-MATCHING of current module
andREMATCHING!
, but this is not the main problem. The main problem is evaluation. At first evaluation, before training I got very low performance:Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.007
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007
Which is way different from the numbers which were outputted while training the detector. So after it I decided to evaluate my detector by running
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=1 tools/detector_pretest_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" MODEL.PRETRAINED_DETECTOR_CKPT /home/lkochiev/checkpoints/pretrained_faster_rcnn/model_final.pth
. But somehow this script outputted me similar performance:Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.007
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007
like the model was never trained. I can't understand what I am doing wrong here. Thanks in advance for your guidance!