facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.22k stars 5.45k forks source link

error : segmentation fault when i train model?thanks! #928

Open Peterisfar opened 5 years ago

Peterisfar commented 5 years ago

here is info: Found Detectron ops lib: /home/zhulijun/pytorch/build/lib/libcaffe2_detectron_ops_gpu.so [E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. INFO train_net.py: 86: Called with args: INFO train_net.py: 87: Namespace(cfg_file='configs/DensePose_ResNet50_FPN_single_GPU.yaml', multi_gpu_testing=False, opts=['OUTPUT_DIR', '/tmp/detectron-output'], skip_test=False) INFO train_net.py: 93: Training with config: INFO train_net.py: 94: {'BBOX_XFORM_CLIP': 4.135166556742356, 'BODY_UV_RCNN': {'BODY_UV_IMS': True, 'CONV_HEAD_DIM': 512, 'CONV_HEAD_KERNEL': 3, 'CONV_INIT': 'MSRAFill', 'DECONV_DIM': 256, 'DECONV_KERNEL': 4, 'DILATION': 1, 'HEATMAP_SIZE': 56, 'INDEX_WEIGHTS': 2.0, 'NUM_PATCHES': 24, 'NUM_STACKED_CONVS': 8, 'PART_WEIGHTS': 0.3, 'POINT_REGRESSION_WEIGHTS': 0.1, 'ROI_HEAD': 'body_uv_rcnn_heads.add_roi_body_uv_head_v1convX', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 14, 'ROI_XFORM_SAMPLING_RATIO': 2, 'UP_SCALE': 2, 'USE_DECONV_OUTPUT': True}, 'CLUSTER': {'ON_CLUSTER': False}, 'DATA_LOADER': {'BLOBS_QUEUE_CAPACITY': 8, 'MINIBATCH_QUEUE_SIZE': 64, 'NUM_THREADS': 4}, 'DEDUP_BOXES': 0.0625, 'DOWNLOAD_CACHE': '/tmp/detectron-download-cache', 'EPS': 1e-14, 'EXPECTED_RESULTS': [], 'EXPECTED_RESULTS_ATOL': 0.005, 'EXPECTED_RESULTS_EMAIL': '', 'EXPECTED_RESULTS_RTOL': 0.1, 'FAST_RCNN': {'CONV_HEAD_DIM': 256, 'MLP_HEAD_DIM': 1024, 'NUM_STACKED_CONVS': 4, 'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 2}, 'FPN': {'COARSEST_STRIDE': 32, 'DIM': 256, 'EXTRA_CONV_LEVELS': False, 'FPN_ON': True, 'MULTILEVEL_ROIS': True, 'MULTILEVEL_RPN': True, 'ROI_CANONICAL_LEVEL': 4, 'ROI_CANONICAL_SCALE': 224, 'ROI_MAX_LEVEL': 5, 'ROI_MIN_LEVEL': 2, 'RPN_ANCHOR_START_SIZE': 32, 'RPN_ASPECT_RATIOS': (0.5, 1, 2), 'RPN_MAX_LEVEL': 6, 'RPN_MIN_LEVEL': 2, 'USE_GN': False, 'ZERO_INIT_LATERAL': False}, 'GROUP_NORM': {'DIM_PER_GP': -1, 'EPSILON': 1e-05, 'NUM_GROUPS': 32}, 'KRCNN': {'CONV_HEAD_DIM': 256, 'CONV_HEAD_KERNEL': 3, 'CONV_INIT': 'GaussianFill', 'DECONV_DIM': 256, 'DECONV_KERNEL': 4, 'DILATION': 1, 'HEATMAP_SIZE': -1, 'INFERENCE_MIN_SIZE': 0, 'KEYPOINT_CONFIDENCE': 'bbox', 'LOSS_WEIGHT': 1.0, 'MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH': 20, 'NMS_OKS': False, 'NORMALIZE_BY_VISIBLE_KEYPOINTS': True, 'NUM_KEYPOINTS': -1, 'NUM_STACKED_CONVS': 8, 'ROI_KEYPOINTS_HEAD': '', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 0, 'UP_SCALE': -1, 'USE_DECONV': False, 'USE_DECONV_OUTPUT': False}, 'MATLAB': 'matlab', 'MEMONGER': True, 'MEMONGER_SHARE_ACTIVATIONS': False, 'MODEL': {'BBOX_REG_WEIGHTS': (10.0, 10.0, 5.0, 5.0), 'BODY_UV_ON': True, 'CLS_AGNOSTIC_BBOX_REG': False, 'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body', 'EXECUTION_TYPE': 'dag', 'FASTER_RCNN': False, 'KEYPOINTS_ON': False, 'MASK_ON': False, 'NUM_CLASSES': 2, 'RPN_ONLY': False, 'TYPE': 'generalized_rcnn'}, 'MRCNN': {'CLS_SPECIFIC_MASK': True, 'CONV_INIT': 'GaussianFill', 'DILATION': 2, 'DIM_REDUCED': 256, 'RESOLUTION': 14, 'ROI_MASK_HEAD': '', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 0, 'THRESH_BINARIZE': 0.5, 'UPSAMPLE_RATIO': 1, 'USE_FC_OUTPUT': False, 'WEIGHT_LOSS_MASK': 1.0}, 'NUM_GPUS': 1, 'OUTPUT_DIR': '/tmp/detectron-output', 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]), 'RESNETS': {'NUM_GROUPS': 1, 'RES5_DILATION': 1, 'SHORTCUT_FUNC': 'basic_bn_shortcut', 'STEM_FUNC': 'basic_bn_stem', 'STRIDE_1X1': True, 'TRANS_FUNC': 'bottleneck_transformation', 'WIDTH_PER_GROUP': 64}, 'RETINANET': {'ANCHOR_SCALE': 4, 'ASPECT_RATIOS': (0.5, 1.0, 2.0), 'BBOX_REG_BETA': 0.11, 'BBOX_REG_WEIGHT': 1.0, 'CLASS_SPECIFIC_BBOX': False, 'INFERENCE_TH': 0.05, 'LOSS_ALPHA': 0.25, 'LOSS_GAMMA': 2.0, 'NEGATIVE_OVERLAP': 0.4, 'NUM_CONVS': 4, 'POSITIVE_OVERLAP': 0.5, 'PRE_NMS_TOP_N': 1000, 'PRIOR_PROB': 0.01, 'RETINANET_ON': False, 'SCALES_PER_OCTAVE': 3, 'SHARE_CLS_BBOX_TOWER': False, 'SOFTMAX': False}, 'RFCN': {'PS_GRID_SIZE': 3}, 'RNG_SEED': 3, 'ROOT_DIR': '/home/zhulijun/code/python/densepose', 'RPN': {'ASPECT_RATIOS': (0.5, 1, 2), 'RPN_ON': False, 'SIZES': (64, 128, 256, 512), 'STRIDE': 16}, 'SOLVER': {'BASE_LR': 0.00025, 'GAMMA': 0.1, 'LOG_LR_CHANGE_THRESHOLD': 1.1, 'LRS': [], 'LR_POLICY': 'steps_with_decay', 'MAX_ITER': 720000, 'MOMENTUM': 0.9, 'SCALE_MOMENTUM': True, 'SCALE_MOMENTUM_THRESHOLD': 1.1, 'STEPS': [0, 480000, 640000], 'STEP_SIZE': 30000, 'WARM_UP_FACTOR': 0.1, 'WARM_UP_ITERS': 1000, 'WARM_UP_METHOD': u'linear', 'WEIGHT_DECAY': 0.0001, 'WEIGHT_DECAY_GN': 0.0}, 'TEST': {'BBOX_AUG': {'AREA_TH_HI': 32400, 'AREA_TH_LO': 2500, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'COORD_HEUR': 'UNION', 'ENABLED': False, 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False, 'SCORE_HEUR': 'UNION'}, 'BBOX_REG': True, 'BBOX_VOTE': {'ENABLED': False, 'SCORING_METHOD': 'ID', 'SCORING_METHOD_BETA': 1.0, 'VOTE_TH': 0.8}, 'COMPETITION_MODE': True, 'DATASETS': ('dense_coco_2014_minival',), 'DETECTIONS_PER_IM': 20, 'FORCE_JSON_DATASET_EVAL': True, 'KPS_AUG': {'AREA_TH': 32400, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'ENABLED': False, 'HEUR': 'HM_AVG', 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False}, 'MASK_AUG': {'AREA_TH': 32400, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'ENABLED': False, 'HEUR': 'SOFT_AVG', 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False}, 'MAX_SIZE': 1333, 'NMS': 0.5, 'PRECOMPUTED_PROPOSALS': True, 'PROPOSAL_FILES': ('/tmp/detectron-download-cache/DensePose-RPN-minival_fpn_resnet50.pkl',), 'PROPOSAL_LIMIT': 1000, 'RPN_MIN_SIZE': 0, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SCALE': 800, 'SCORE_THRESH': 0.05, 'SOFT_NMS': {'ENABLED': False, 'METHOD': 'linear', 'SIGMA': 0.5}, 'WEIGHTS': ''}, 'TRAIN': {'ASPECT_GROUPING': True, 'AUTO_RESUME': True, 'BATCH_SIZE_PER_IM': 512, 'BBOX_THRESH': 0.5, 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'CROWD_FILTER_THRESH': 0.7, 'DATASETS': ('dense_coco_2014_train', 'dense_coco_2014_valminusminival'), 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'FREEZE_CONV_BODY': False, 'GT_MIN_AREA': -1, 'IMS_PER_BATCH': 1, 'MAX_SIZE': 1333, 'PROPOSAL_FILES': ('/tmp/detectron-download-cache/DensePose-RPN-train_fpn_resnet50.pkl', '/tmp/detectron-download-cache/DensePose-RPN-valminusminival_fpn_resnet50.pkl'), 'RPN_BATCH_SIZE_PER_IM': 256, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 0, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'RPN_STRADDLE_THRESH': 0, 'SCALES': (640, 672, 704, 736, 768, 800), 'SNAPSHOT_ITERS': 20000, 'USE_FLIPPED': True, 'WEIGHTS': '/tmp/detectron-download-cache/R-50.pkl'}, 'USE_NCCL': False, 'VIS': False, 'VIS_TH': 0.9} INFO train.py: 123: Building model: generalized_rcnn WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information. WARNING model_helper.py: 447: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 447: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING model_helper.py: 447: You are creating an op that the ModelHelper does not recognize: PoolPointsInterp. WARNING memonger.py: 55: NOTE: Executing memonger to optimize gradient memory [I memonger.cc:236] Remapping 98 using 20 shared blobs. INFO memonger.py: 97: Memonger memory optimization took 0.0661389827728 secs [I context_gpu.cu:390] GPU 0: 151 MB [I context_gpu.cu:394] Total: 151 MB [I context_gpu.cu:390] GPU 0: 280 MB [I context_gpu.cu:394] Total: 280 MB [I context_gpu.cu:390] GPU 0: 411 MB [I context_gpu.cu:394] Total: 411 MB INFO train.py: 171: Loading dataset: ('dense_coco_2014_train', 'dense_coco_2014_valminusminival') loading annotations into memory... Done (t=17.37s) creating index... index created! INFO json_dataset.py: 299: Loading proposals from: /tmp/detectron-download-cache/DensePose-RPN-train_fpn_resnet50.pkl INFO json_dataset.py: 307: 1/26437 INFO json_dataset.py: 307: 2501/26437 INFO json_dataset.py: 307: 5001/26437 INFO json_dataset.py: 307: 7501/26437 INFO json_dataset.py: 307: 10001/26437 INFO json_dataset.py: 307: 12501/26437 INFO json_dataset.py: 307: 15001/26437 INFO json_dataset.py: 307: 17501/26437 INFO json_dataset.py: 307: 20001/26437 INFO json_dataset.py: 307: 22501/26437 INFO json_dataset.py: 307: 25001/26437 INFO roidb.py: 41: Appending horizontally-flipped training examples... INFO roidb.py: 43: Loaded dataset: dense_coco_2014_train loading annotations into memory... Done (t=6.19s) creating index... index created! INFO json_dataset.py: 299: Loading proposals from: /tmp/detectron-download-cache/DensePose-RPN-valminusminival_fpn_resnet50.pkl INFO json_dataset.py: 307: 1/5984 INFO json_dataset.py: 307: 2501/5984 INFO json_dataset.py: 307: 5001/5984 INFO roidb.py: 41: Appending horizontally-flipped training examples... INFO roidb.py: 43: Loaded dataset: dense_coco_2014_valminusminival INFO roidb.py: 130: Filtered 0 roidb entries: 64842 -> 64842 INFO roidb.py: 59: Computing bounding-box regression targets... INFO roidb.py: 61: done INFO train.py: 175: 64842 roidb entries INFO net.py: 51: Loading weights from: /tmp/detectron-download-cache/R-50.pkl INFO net.py: 80: fpn_inner_res5_2_sum_w not found INFO net.py: 80: fpn_inner_res5_2_sum_b not found INFO net.py: 80: fpn_inner_res4_5_sum_lateral_w not found INFO net.py: 80: fpn_inner_res4_5_sum_lateral_b not found INFO net.py: 80: fpn_inner_res3_3_sum_lateral_w not found INFO net.py: 80: fpn_inner_res3_3_sum_lateral_b not found INFO net.py: 80: fpn_inner_res2_2_sum_lateral_w not found INFO net.py: 80: fpn_inner_res2_2_sum_lateral_b not found INFO net.py: 80: fpn_res5_2_sum_w not found INFO net.py: 80: fpn_res5_2_sum_b not found INFO net.py: 80: fpn_res4_5_sum_w not found INFO net.py: 80: fpn_res4_5_sum_b not found INFO net.py: 80: fpn_res3_3_sum_w not found INFO net.py: 80: fpn_res3_3_sum_b not found INFO net.py: 80: fpn_res2_2_sum_w not found INFO net.py: 80: fpn_res2_2_sum_b not found INFO net.py: 80: fc6_w not found INFO net.py: 80: fc6_b not found INFO net.py: 80: fc7_w not found INFO net.py: 80: fc7_b not found INFO net.py: 80: cls_score_w not found INFO net.py: 80: cls_score_b not found INFO net.py: 80: bbox_pred_w not found INFO net.py: 80: bbox_pred_b not found INFO net.py: 80: body_conv_fcn1_w not found INFO net.py: 80: body_conv_fcn1_b not found INFO net.py: 80: body_conv_fcn2_w not found INFO net.py: 80: body_conv_fcn2_b not found INFO net.py: 80: body_conv_fcn3_w not found INFO net.py: 80: body_conv_fcn3_b not found INFO net.py: 80: body_conv_fcn4_w not found INFO net.py: 80: body_conv_fcn4_b not found INFO net.py: 80: body_conv_fcn5_w not found INFO net.py: 80: body_conv_fcn5_b not found INFO net.py: 80: body_conv_fcn6_w not found INFO net.py: 80: body_conv_fcn6_b not found INFO net.py: 80: body_conv_fcn7_w not found INFO net.py: 80: body_conv_fcn7_b not found INFO net.py: 80: body_conv_fcn8_w not found INFO net.py: 80: body_conv_fcn8_b not found INFO net.py: 80: AnnIndex_lowres_w not found INFO net.py: 80: AnnIndex_lowres_b not found INFO net.py: 80: Index_UV_lowres_w not found INFO net.py: 80: Index_UV_lowres_b not found INFO net.py: 80: U_lowres_w not found INFO net.py: 80: U_lowres_b not found INFO net.py: 80: V_lowres_w not found INFO net.py: 80: V_lowres_b not found INFO net.py: 80: AnnIndex_w not found INFO net.py: 80: AnnIndex_b not found INFO net.py: 80: Index_UV_w not found INFO net.py: 80: Index_UV_b not found INFO net.py: 80: U_estimated_w not found INFO net.py: 80: U_estimated_b not found INFO net.py: 80: V_estimated_w not found INFO net.py: 80: V_estimated_b not found [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000621146 secs INFO train.py: 159: Outputs saved to: /tmp/detectron-output/train/dense_coco_2014_train:dense_coco_2014_valminusminival/generalized_rcnn INFO loader.py: 221: Pre-filling mini-batch queue... INFO loader.py: 226: [0/64] INFO loader.py: 226: [1/64] INFO loader.py: 226: [2/64] INFO loader.py: 226: [12/64] INFO loader.py: 226: [20/64] INFO loader.py: 226: [29/64] INFO loader.py: 226: [36/64] INFO loader.py: 226: [45/64] INFO loader.py: 226: [55/64] INFO loader.py: 226: [62/64] INFO detector.py: 471: Changing learning rate 0.000000 -> 0.000025 at iter 0 [I net_async_base.h:211] Using specified CPU pool size: 4; device id: -1 [I net_async_base.h:216] Created new CPU pool, size: 4; device id: -1 [I context_gpu.cu:390] GPU 0: 541 MB [I context_gpu.cu:394] Total: 541 MB [I context_gpu.cu:390] GPU 0: 690 MB [I context_gpu.cu:394] Total: 690 MB [I context_gpu.cu:390] GPU 0: 847 MB [I context_gpu.cu:394] Total: 847 MB [I context_gpu.cu:390] GPU 0: 994 MB [I context_gpu.cu:394] Total: 994 MB [I context_gpu.cu:390] GPU 0: 1131 MB [I context_gpu.cu:394] Total: 1131 MB [I context_gpu.cu:390] GPU 0: 1278 MB [I context_gpu.cu:394] Total: 1278 MB [I context_gpu.cu:390] GPU 0: 1412 MB [I context_gpu.cu:394] Total: 1412 MB [I context_gpu.cu:390] GPU 0: 1543 MB [I context_gpu.cu:394] Total: 1543 MB [I context_gpu.cu:390] GPU 0: 1697 MB [I context_gpu.cu:394] Total: 1697 MB [I context_gpu.cu:390] GPU 0: 1844 MB [I context_gpu.cu:394] Total: 1844 MB Segmentation fault