KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.03k stars 228 forks source link

TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray, while running the tools/relation_test_net.py file #138

Closed sonaliashish closed 2 years ago

sonaliashish commented 2 years ago

❓ Questions and Help

Hello @KaihuaTang! Thanks for the awesome work. I am facing the following 'TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray' while running the tools/relation_test_net.py file:
Please help me to solve this error?

!CUDA_VISIBLE_DEVICES=0 python tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" \ MODEL.ROI_RELATION_HEAD.USE_GT_BOX True \ MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True \ MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor \ MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE None \ MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum \ MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs \ TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR ../glove6b \ MODEL.PRETRAINED_DETECTOR_CKPT checkpoints/causal-motifs-sgdet/model_0028000.pth \ OUTPUT_DIR output \ TEST.CUSTUM_EVAL True \ TEST.CUSTUM_PATH /media/sonali/sonu/1python/datasets/visual_genome/images \ DETECTED_SGG_DIR output/generated_sgg

Whole stack trace: 2021-07-09 20:57:39,649 maskrcnn_benchmark INFO: Using 1 GPUs 2021-07-09 20:57:39,649 maskrcnn_benchmark INFO: AMP_VERBOSE: False DATALOADER: ASPECT_RATIO_GROUPING: True NUM_WORKERS: 4 SIZE_DIVISIBILITY: 32 DATASETS: TEST: ('VG_stanford_filtered_with_attribute_test',) TRAIN: ('VG_stanford_filtered_with_attribute_train',) VAL: ('VG_stanford_filtered_with_attribute_val',) DETECTED_SGG_DIR: . DTYPE: float16 GLOVE_DIR: ../glove6b INPUT: BRIGHTNESS: 0.0 CONTRAST: 0.0 HUE: 0.0 MAX_SIZE_TEST: 1000 MAX_SIZE_TRAIN: 1000 MIN_SIZE_TEST: 600 MIN_SIZE_TRAIN: (600,) PIXEL_MEAN: [102.9801, 115.9465, 122.7717] PIXEL_STD: [1.0, 1.0, 1.0] SATURATION: 0.0 TO_BGR255: True VERTICAL_FLIP_PROB_TRAIN: 0.0 MODEL: ATTRIBUTE_ON: False BACKBONE: CONV_BODY: R-101-FPN FREEZE_CONV_BODY_AT: 2 CLS_AGNOSTIC_BBOX_REG: False DEVICE: cuda FBNET: ARCH: default ARCH_DEF: BN_TYPE: bn DET_HEAD_BLOCKS: [] DET_HEAD_LAST_SCALE: 1.0 DET_HEAD_STRIDE: 0 DW_CONV_SKIP_BN: True DW_CONV_SKIP_RELU: True KPTS_HEAD_BLOCKS: [] KPTS_HEAD_LAST_SCALE: 0.0 KPTS_HEAD_STRIDE: 0 MASK_HEAD_BLOCKS: [] MASK_HEAD_LAST_SCALE: 0.0 MASK_HEAD_STRIDE: 0 RPN_BN_TYPE: RPN_HEAD_BLOCKS: 0 SCALE_FACTOR: 1.0 WIDTH_DIVISOR: 1 FLIP_AUG: False FPN: USE_GN: False USE_RELU: False GROUP_NORM: DIM_PER_GP: -1 EPSILON: 1e-05 NUM_GROUPS: 32 KEYPOINT_ON: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN PRETRAINED_DETECTOR_CKPT: RELATION_ON: True RESNETS: BACKBONE_OUT_CHANNELS: 256 DEFORMABLE_GROUPS: 1 NUM_GROUPS: 32 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STAGE_WITH_DCN: (False, False, False, False) STEM_FUNC: StemWithFixedBatchNorm STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: False TRANS_FUNC: BottleneckWithFixedBatchNorm WIDTH_PER_GROUP: 8 WITH_MODULATED_DCN: False RETINANET: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDES: (8, 16, 32, 64, 128) ASPECT_RATIOS: (0.5, 1.0, 2.0) BBOX_REG_BETA: 0.11 BBOX_REG_WEIGHT: 4.0 BG_IOU_THRESHOLD: 0.4 FG_IOU_THRESHOLD: 0.5 INFERENCE_TH: 0.05 LOSS_ALPHA: 0.25 LOSS_GAMMA: 2.0 NMS_TH: 0.4 NUM_CLASSES: 81 NUM_CONVS: 4 OCTAVE: 2.0 PRE_NMS_TOP_N: 1000 PRIOR_PROB: 0.01 SCALES_PER_OCTAVE: 3 STRADDLE_THRESH: 0 USE_C5: True RETINANET_ON: False ROI_ATTRIBUTE_HEAD: ATTRIBUTE_BGFG_RATIO: 3 ATTRIBUTE_BGFG_SAMPLE: True ATTRIBUTE_LOSS_WEIGHT: 1.0 FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor MAX_ATTRIBUTES: 10 NUM_ATTRIBUTES: 201 POS_WEIGHT: 50.0 PREDICTOR: FPNPredictor SHARE_BOX_FEATURE_EXTRACTOR: True USE_BINARY_LOSS: True ROI_BOX_HEAD: CONV_HEAD_DIM: 256 DILATION: 1 FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor MLP_HEAD_DIM: 4096 NUM_CLASSES: 151 NUM_STACKED_CONVS: 4 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) PREDICTOR: FPNPredictor USE_GN: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) BG_IOU_THRESHOLD: 0.3 DETECTIONS_PER_IMG: 80 FG_IOU_THRESHOLD: 0.5 NMS: 0.3 NMS_FILTER_DUPLICATES: True POSITIVE_FRACTION: 0.5 POST_NMS_PER_CLS_TOPN: 300 SCORE_THRESH: 0.01 USE_FPN: True ROI_KEYPOINT_HEAD: CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512) FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor MLP_HEAD_DIM: 1024 NUM_CLASSES: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) PREDICTOR: KeypointRCNNPredictor RESOLUTION: 14 SHARE_BOX_FEATURE_EXTRACTOR: True ROI_MASK_HEAD: CONV_LAYERS: (256, 256, 256, 256) DILATION: 1 FEATURE_EXTRACTOR: ResNet50Conv5ROIFeatureExtractor MLP_HEAD_DIM: 1024 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_SCALES: (0.0625,) POSTPROCESS_MASKS: False POSTPROCESS_MASKS_THRESHOLD: 0.5 PREDICTOR: MaskRCNNC4Predictor RESOLUTION: 14 SHARE_BOX_FEATURE_EXTRACTOR: True USE_GN: False ROI_RELATION_HEAD: ADD_GTBOX_TO_PROPOSAL_IN_TRAIN: True BATCH_SIZE_PER_IMAGE: 1024 CAUSAL: CONTEXT_LAYER: motifs EFFECT_ANALYSIS: True EFFECT_TYPE: None FUSION_TYPE: sum SEPARATE_SPATIAL: False SPATIAL_FOR_VISION: True CONTEXT_DROPOUT_RATE: 0.2 CONTEXT_HIDDEN_DIM: 512 CONTEXT_OBJ_LAYER: 1 CONTEXT_POOLING_DIM: 4096 CONTEXT_REL_LAYER: 1 EMBED_DIM: 200 FEATURE_EXTRACTOR: RelationFeatureExtractor LABEL_SMOOTHING_LOSS: False NUM_CLASSES: 51 NUM_SAMPLE_PER_GT_REL: 4 POOLING_ALL_LEVELS: True POSITIVE_FRACTION: 0.25 PREDICTOR: MotifPredictor PREDICT_USE_BIAS: True PREDICT_USE_VISION: True REL_PROP: [0.01858, 0.00057, 0.00051, 0.00109, 0.0015, 0.00489, 0.00432, 0.02913, 0.00245, 0.00121, 0.00404, 0.0011, 0.00132, 0.00172, 5e-05, 0.00242, 0.0005, 0.00048, 0.00208, 0.15608, 0.0265, 0.06091, 0.009, 0.00183, 0.00225, 0.0009, 0.00028, 0.00077, 0.04844, 0.08645, 0.31621, 0.00088, 0.00301, 0.00042, 0.00186, 0.001, 0.00027, 0.01012, 0.0001, 0.01286, 0.00647, 0.00084, 0.01077, 0.00132, 0.00069, 0.00376, 0.00214, 0.11424, 0.01205, 0.02958] REQUIRE_BOX_OVERLAP: False TRANSFORMER: DROPOUT_RATE: 0.1 INNER_DIM: 2048 KEY_DIM: 64 NUM_HEAD: 8 OBJ_LAYER: 4 REL_LAYER: 2 VAL_DIM: 64 USE_GT_BOX: True USE_GT_OBJECT_LABEL: True RPN: ANCHOR_SIZES: (32, 64, 128, 256, 512) ANCHOR_STRIDE: (4, 8, 16, 32, 64) ASPECT_RATIOS: (0.23232838, 0.63365731, 1.28478321, 3.15089189) BATCH_SIZE_PER_IMAGE: 256 BG_IOU_THRESHOLD: 0.3 FG_IOU_THRESHOLD: 0.7 FPN_POST_NMS_PER_BATCH: False FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TRAIN: 1000 MIN_SIZE: 0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TRAIN: 1000 PRE_NMS_TOP_N_TEST: 6000 PRE_NMS_TOP_N_TRAIN: 6000 RPN_HEAD: SingleConvRPNHead RPN_MID_CHANNEL: 256 STRADDLE_THRESH: 0 USE_FPN: True RPN_ONLY: False VGG: VGG16_OUT_CHANNELS: 512 WEIGHT: catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d OUTPUT_DIR: ./output/relation_baseline PATHS_CATALOG: /media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/config/paths_catalog.py PATHS_DATA: /media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/config/../data/datasets SOLVER: BASE_LR: 0.01 BIAS_LR_FACTOR: 1 CHECKPOINT_PERIOD: 2000 CLIP_NORM: 5.0 GAMMA: 0.1 GRAD_NORM_CLIP: 5.0 IMS_PER_BATCH: 4 MAX_ITER: 40000 MOMENTUM: 0.9 PRE_VAL: False PRINT_GRAD_FREQ: 4000 SCHEDULE: COOLDOWN: 0 FACTOR: 0.1 MAX_DECAY_STEP: 3 PATIENCE: 2 THRESHOLD: 0.001 TYPE: WarmupReduceLROnPlateau STEPS: (10000, 16000) TO_VAL: True UPDATE_SCHEDULE_DURING_LOAD: False VAL_PERIOD: 2000 WARMUP_FACTOR: 0.1 WARMUP_ITERS: 500 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0 TEST: ALLOW_LOAD_FROM_CACHE: False BBOX_AUG: ENABLED: False H_FLIP: False MAX_SIZE: 4000 SCALES: () SCALE_H_FLIP: False CUSTUM_EVAL: False CUSTUM_PATH: checkpoints/custom_images DETECTIONS_PER_IMG: 100 EXPECTED_RESULTS: [] EXPECTED_RESULTS_SIGMA_TOL: 4 IMS_PER_BATCH: 1 RELATION: IOU_THRESHOLD: 0.5 LATER_NMS_PREDICTION_THRES: 0.5 MULTIPLE_PREDS: False REQUIRE_OVERLAP: False SYNC_GATHER: True SAVE_PROPOSALS: True 2021-07-09 20:57:39,650 maskrcnn_benchmark INFO: Collecting env info (might take some time) 2021-07-09 20:57:46,383 maskrcnn_benchmark INFO: PyTorch version: 1.5.0+cu101 Is debug build: No CUDA used to build PyTorch: 10.1

OS: Ubuntu 20.04.2 LTS GCC version: (Ubuntu 8.4.0-3ubuntu2) 8.4.0 CMake version: version 3.16.3

Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: GeForce RTX 2080 Super with Max-Q Design Nvidia driver version: 460.80 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries: [pip3] msgpack-numpy==0.4.7.1 [pip3] numpy==1.19.5 [pip3] numpydoc==1.1.0 [pip3] torch==1.5.0+cu101 [pip3] torchvision==0.6.0+cu101 [conda] blas 1.0 mkl anaconda [conda] mkl 2019.4 243 anaconda [conda] mkl-service 2.3.0 py37he904b0f_0 anaconda [conda] mkl_fft 1.2.0 py37h23d657b_0 anaconda [conda] mkl_random 1.1.0 py37hd6b4f25_0 anaconda [conda] torch 1.5.0+cu101 pypi_0 pypi [conda] torchvision 0.6.0+cu101 pypi_0 pypi Pillow (8.0.0)

2021-07-09 20:57:48,912 maskrcnn_benchmark.data.build INFO: ---------------------------------------------------------------------------------------------------- 2021-07-09 20:57:48,912 maskrcnn_benchmark.data.build INFO: get dataset statistics... 2021-07-09 20:57:48,913 maskrcnn_benchmark.data.build INFO: Loading data statistics from: ./output/relation_baseline/VG_stanford_filtered_with_attribute_train_statistics.cache 2021-07-09 20:57:48,913 maskrcnn_benchmark.data.build INFO: ---------------------------------------------------------------------------------------------------- loading word vectors from ../glove6b/glove.6B.200d.pt background -> background fail on background loading word vectors from ../glove6b/glove.6B.200d.pt background -> background fail on background 2021-07-09 20:57:51,882 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ./output/relation_baseline/model_0002000.pth 2021-07-09 20:57:52,722 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.post_cat.bias of shape (4096,) 2021-07-09 20:57:52,723 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.post_cat.weight of shape (4096, 1024) 2021-07-09 20:57:52,723 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.rel_compress.bias of shape (51,) 2021-07-09 20:57:52,723 maskrcnn_benchmark.utils.model_serialization INFO: NO-MATCHING of current module: roi_heads.relation.predictor.rel_compress.weight of shape (51, 4096) img_dir= datasets/vg/images roidb_file= datasets/vg/VG-SGG-with-attri.h5 dict_file= datasets/vg/VG-SGG-dicts-with-attri.json image_file= datasets/vg/image_data.json img_dir= datasets/vg/images num_im= -1 num_val_im= 5000 2021-07-09 20:57:54,088 maskrcnn_benchmark.inference INFO: Start evaluation on VG_stanford_filtered_with_attribute_test dataset(9 images). 0%| | 0/9 [00:00<?, ?it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d78e90> targets= (BoxList(num_boxes=9, image_width=800, image_height=600, mode=xyxy),) image_ids= (0,) self.cfg.MODEL.RELATION_ON= True /pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated: nonzero(Tensor input, , Tensor out) Consider using one of the following signatures instead: nonzero(Tensor input, , bool as_tuple) 11%|█████ | 1/9 [00:00<00:05, 1.51it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d78f90> targets= (BoxList(num_boxes=11, image_width=800, image_height=600, mode=xyxy),) image_ids= (1,) self.cfg.MODEL.RELATION_ON= True 22%|██████████ | 2/9 [00:01<00:04, 1.63it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c904b7410> targets= (BoxList(num_boxes=18, image_width=800, image_height=600, mode=xyxy),) image_ids= (2,) self.cfg.MODEL.RELATION_ON= True 33%|███████████████ | 3/9 [00:01<00:03, 1.70it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c904b7c50> targets= (BoxList(num_boxes=8, image_width=800, image_height=600, mode=xyxy),) image_ids= (3,) self.cfg.MODEL.RELATION_ON= True 44%|████████████████████ | 4/9 [00:02<00:02, 1.79it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c3512dfd0> targets= (BoxList(num_boxes=6, image_width=800, image_height=600, mode=xyxy),) image_ids= (4,) self.cfg.MODEL.RELATION_ON= True 56%|█████████████████████████ | 5/9 [00:02<00:02, 1.87it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d78390> targets= (BoxList(num_boxes=6, image_width=800, image_height=600, mode=xyxy),) image_ids= (5,) self.cfg.MODEL.RELATION_ON= True 67%|██████████████████████████████ | 6/9 [00:03<00:01, 1.92it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d6ebd0> targets= (BoxList(num_boxes=15, image_width=800, image_height=600, mode=xyxy),) image_ids= (6,) self.cfg.MODEL.RELATION_ON= True 78%|███████████████████████████████████ | 7/9 [00:03<00:01, 1.92it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d6e550> targets= (BoxList(num_boxes=13, image_width=800, image_height=600, mode=xyxy),) image_ids= (7,) self.cfg.MODEL.RELATION_ON= True 89%|████████████████████████████████████████ | 8/9 [00:04<00:00, 1.94it/s]images= <maskrcnn_benchmark.structures.image_list.ImageList object at 0x7f7c90d6ec90> targets= (BoxList(num_boxes=15, image_width=800, image_height=600, mode=xyxy),) image_ids= (8,) self.cfg.MODEL.RELATION_ON= True 100%|█████████████████████████████████████████████| 9/9 [00:04<00:00, 1.90it/s] 2021-07-09 20:57:58,834 maskrcnn_benchmark.inference INFO: Total run time: 0:00:04.745990 (0.5273322529262967 s / img per device, on 1 devices) 2021-07-09 20:57:58,834 maskrcnn_benchmark.inference INFO: Model inference time: 0:00:04.556423 (0.5062692430284288 s / img per device, on 1 devices) creating index... index created! type(cocolike_predictions)= <class 'numpy.ndarray'> 101 (7,) Loading and preparing results...
Traceback (most recent call last): File "tools/relation_test_net.py", line 112, in main() File "tools/relation_test_net.py", line 106, in main output_folder=output_folder, File "/media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/engine/inference.py", line 160, in inference extra_args) File "/media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/data/datasets/evaluation/init.py", line 27, in evaluate return vg_evaluation(args) File "/media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/data/datasets/evaluation/vg/init.py", line 19, in vg_evaluation iou_types=iou_types, File "/media/sonali/sonu/1python/SceneGraph_Benchmark/maskrcnn_benchmark/data/datasets/evaluation/vg/vg_eval.py", line 106, in do_vg_evaluation res = fauxcoco.loadRes(cocolike_predictions)
File "/home/sonali/anaconda3/lib/python3.7/site-packages/pycocotools/coco.py", line 293, in loadRes anns = json.load(open(resFile)) TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray

sonaliashish commented 2 years ago

Solved the above issue by passing Input in coco annotation style JSON file.