Open nullkatar opened 3 years ago
I believe this might be caused by too many proposals being sampled (a higher number than what is specified in the config file).
As a result, during testing, the number of union regions that is passed to rect_features = self.rect_conv(rect_inputs)
could be much higher than 80*79 (if the maximum number of detections is set to 80), resulting in the OOM error.
This might be caused, if all/too many detections have the same confidence score.
In that case, keep
could filter out more than the intended max. 80 proposals during postprocessing:
https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/d0ffa40d92133d7d865e531146de82c8c8a344c0/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py#L224
You could check out the dimension of rect_inputs
and compare it with the specified maximum number of detections to verify if this is the case.
🐛 Bug
SGDet model (with and without attributes, tried both of them) can not be evaluated. It can be trained (with
SOLVER.PRE_VAL False
turned on). Also I tired switching on and off these two arguments:MODEL.ROI_RELATION_HEAD.USE_GT_BOX, MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL
and discovered that without using GT_BOX (set to True) I can not evaluate model.To Reproduce
Steps to reproduce the behavior:
1.Train FasterRCNN using the following command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=1 tools/detector_pretrain_net.py --config-file "configs/e2e_relation_detector_X_101_32_8_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_ITER 100000 SOLVER.STEPS "(30000, 45000)" SOLVER.VAL_PERIOD 20000 SOLVER.CHECKPOINT_PERIOD 20000 MODEL.RELATION_ON False OUTPUT_DIR ./pretrained_faster_rcnn_with_att SOLVER.PRE_VAL False MODEL.PRETRAINED_DETECTOR_CKPT ./pretrained_faster_rcnn/model_final.pth
Environment
PyTorch version: 1.4.0 Is debug build: False CUDA used to build PyTorch: 10.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3
Python version: 3.6 (64-bit runtime) Is CUDA available: True CUDA runtime version: 11.2.67 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti Nvidia driver version: 460.39 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.19.2 [pip3] torch==1.4.0 [pip3] torchvision==0.5.0 [conda] blas 1.0 mkl [conda] cudatoolkit 10.1.243 h6bb024c_0 [conda] mkl 2020.2 256 [conda] mkl-service 2.3.0 py36he8ac12f_0 [conda] mkl_fft 1.2.0 py36h23d657b_0 [conda] mkl_random 1.1.1 py36h0573a6f_0 [conda] numpy 1.19.2 py36h54aff64_0 [conda] numpy-base 1.19.2 py36hfa32c7d_0 [conda] pytorch 1.4.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch [conda] torchvision 0.5.0 py36_cu101 pytorch
Additional context
I tired running this model with different CUDA and Python versions, but result is still the same for everything.