facebookresearch / DensePose

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body
http://densepose.org
Other
6.96k stars 1.3k forks source link

Error when try to use my own dataset #167

Closed liangwx closed 5 years ago

liangwx commented 5 years ago

I annotated my own dataset , which have more than 196 annotated points each image(eg.200,or more), and I change all the int " 196" in "body_uv_rcnn.py" and "body_uv_rcnn_heads.py" to int "200" so that it can hold 200 annotated points int the RGB image, but meet error :

INFO net.py: 241: U_points : (1, 1, 200, 25) => loss_Upoints : () ------| INFO net.py: 241: Uv_point_weights : (1, 1, 200, 25) => loss_Upoints : () ------| INFO net.py: 241: Uv_point_weights : (1, 1, 200, 25) => loss_Upoints : () ------| INFO net.py: 241: interp_V_reshaped : (1, 1, 200, 25) => loss_Vpoints : () ------- (op: SmoothL1Loss) INFO net.py: 241: V_points : (1, 1, 200, 25) => loss_Vpoints : () ------| INFO net.py: 241: Uv_point_weights : (1, 1, 200, 25) => loss_Vpoints : () ------| INFO net.py: 241: Uv_point_weights : (1, 1, 200, 25) => loss_Vpoints : () ------| INFO net.py: 245: End of model: generalized_rcnn json_stats: {"accuracy_cls": 0.812500, "eta": "3 days, 5:15:20", "iter": 0, "loss": 12.790212, "loss_IndexUVPoints": 0.985726, "loss_Upoints": 1.955246, "loss_Vpoints": 2.151226, "loss_bbox": 0.000058, "loss_cls": 0.560533, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.012704, "loss_rpn_bbox_fpn5": 0.000000, "loss_rpn_bbox_fpn6": 0.001694, "loss_rpn_cls_fpn2": 0.504859, "loss_rpn_cls_fpn3": 0.113613, "loss_rpn_cls_fpn4": 0.064132, "loss_rpn_cls_fpn5": 0.000000, "loss_rpn_cls_fpn6": 0.005170, "loss_seg_AnnIndex": 6.435252, "lr": 0.000200, "mb_qsize": 64, "mem": 3161, "time": 2.139387} [I context_gpu.cu:317] GPU 0: 3168 MB [I context_gpu.cu:321] Total: 3168 MB [I context_gpu.cu:317] GPU 0: 3298 MB [I context_gpu.cu:321] Total: 3298 MB [I context_gpu.cu:317] GPU 0: 3426 MB [I context_gpu.cu:321] Total: 3426 MB [I context_gpu.cu:317] GPU 0: 3556 MB [I context_gpu.cu:321] Total: 3556 MB [I context_gpu.cu:317] GPU 0: 3686 MB [I context_gpu.cu:321] Total: 3686 MB json_stats: {"accuracy_cls": 0.875000, "eta": "13:15:54", "iter": 20, "loss": 9.833254, "loss_IndexUVPoints": 0.962407, "loss_Upoints": 1.169112, "loss_Vpoints": 1.246035, "loss_bbox": 0.158750, "loss_cls": 0.349353, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.000000, "loss_rpn_bbox_fpn5": 0.000000, "loss_rpn_bbox_fpn6": 0.002505, "loss_rpn_cls_fpn2": 0.444864, "loss_rpn_cls_fpn3": 0.116791, "loss_rpn_cls_fpn4": 0.026790, "loss_rpn_cls_fpn5": 0.007708, "loss_rpn_cls_fpn6": 0.005567, "loss_seg_AnnIndex": 5.159944, "lr": 0.000236, "mb_qsize": 64, "mem": 3760, "time": 0.367396} [F softmax_ops.cu:462] Check failed: error == cudaSuccess an illegal memory access was encountered [F context_gpu.cu:397] Error at: /home/wangxiaoliang/pytorch/caffe2/core/context_gpu.cu:397: an illegal memory access was encountered Aborted (core dumped)

i just change all the "196" to "200" without other changings. I am sure it 's not my own dataset's fault , because the dense coco dataset meet the same error when just naively changing all "196" to "200". Having never touch the "detectron" and "caffe2" before , It is hard for me to change the net work without more detials about the densepose network. So I need some kind help to make me in the right way to train a densepose model in my own dataset . I have been trapped here many days ago, any help would be greatly appreciated! ! !

shell command

My machine have three 12G GPUs . I try to train the model on GPU 0. python2 tools/train_net.py \ --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \ OUTPUT_DIR /tmp/detectron-output \ NUM_GPUS 1

System information

INFO train_net.py: 86: Called with args: INFO train_net.py: 87: Namespace(cfg_file='configs/DensePose_ResNet50_FPN_s1x-e2e.yaml', multi_gpu_testing=False, opts=['OUTPUT_DIR', '/tmp/detectron-output', 'NUM_GPUS', '1'], skip_test=False) INFO train_net.py: 93: Training with config: INFO train_net.py: 94: {'BBOX_XFORM_CLIP': 4.135166556742356, 'BODY_UV_RCNN': {'BODY_UV_IMS': True, 'CONV_HEAD_DIM': 512, 'CONV_HEAD_KERNEL': 3, 'CONV_INIT': 'MSRAFill', 'DECONV_DIM': 256, 'DECONV_KERNEL': 4, 'DILATION': 1, 'HEATMAP_SIZE': 56, 'INDEX_WEIGHTS': 2.0, 'NUM_PATCHES': 24, 'NUM_STACKED_CONVS': 8, 'PART_WEIGHTS': 0.3, 'POINT_REGRESSION_WEIGHTS': 0.1, 'ROI_HEAD': 'body_uv_rcnn_heads.add_roi_body_uv_head_v1convX', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 14, 'ROI_XFORM_SAMPLING_RATIO': 2, 'UP_SCALE': 2, 'USE_DECONV_OUTPUT': True}, 'CLUSTER': {'ON_CLUSTER': False}, 'DATA_LOADER': {'BLOBS_QUEUE_CAPACITY': 8, 'MINIBATCH_QUEUE_SIZE': 64, 'NUM_THREADS': 4}, 'DEDUP_BOXES': 0.0625, 'DOWNLOAD_CACHE': '/tmp/detectron-download-cache', 'EPS': 1e-14, 'EXPECTED_RESULTS': [], 'EXPECTED_RESULTS_ATOL': 0.005, 'EXPECTED_RESULTS_EMAIL': '', 'EXPECTED_RESULTS_RTOL': 0.1, 'FAST_RCNN': {'CONV_HEAD_DIM': 256, 'MLP_HEAD_DIM': 1024, 'NUM_STACKED_CONVS': 4, 'ROI_BOX_HEAD': 'fast_rcnn_heads.add_roi_2mlp_head', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 2}, 'FPN': {'COARSEST_STRIDE': 32, 'DIM': 256, 'EXTRA_CONV_LEVELS': False, 'FPN_ON': True, 'MULTILEVEL_ROIS': True, 'MULTILEVEL_RPN': True, 'ROI_CANONICAL_LEVEL': 4, 'ROI_CANONICAL_SCALE': 224, 'ROI_MAX_LEVEL': 5, 'ROI_MIN_LEVEL': 2, 'RPN_ANCHOR_START_SIZE': 32, 'RPN_ASPECT_RATIOS': (0.5, 1, 2), 'RPN_MAX_LEVEL': 6, 'RPN_MIN_LEVEL': 2, 'USE_GN': False, 'ZERO_INIT_LATERAL': False}, 'GROUP_NORM': {'DIM_PER_GP': -1, 'EPSILON': 1e-05, 'NUM_GROUPS': 32}, 'KRCNN': {'CONV_HEAD_DIM': 256, 'CONV_HEAD_KERNEL': 3, 'CONV_INIT': 'GaussianFill', 'DECONV_DIM': 256, 'DECONV_KERNEL': 4, 'DILATION': 1, 'HEATMAP_SIZE': -1, 'INFERENCE_MIN_SIZE': 0, 'KEYPOINT_CONFIDENCE': 'bbox', 'LOSS_WEIGHT': 1.0, 'MIN_KEYPOINT_COUNT_FOR_VALID_MINIBATCH': 20, 'NMS_OKS': False, 'NORMALIZE_BY_VISIBLE_KEYPOINTS': True, 'NUM_KEYPOINTS': -1, 'NUM_STACKED_CONVS': 8, 'ROI_KEYPOINTS_HEAD': '', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 0, 'UP_SCALE': -1, 'USE_DECONV': False, 'USE_DECONV_OUTPUT': False}, 'MATLAB': 'matlab', 'MEMONGER': True, 'MEMONGER_SHARE_ACTIVATIONS': False, 'MODEL': {'BBOX_REG_WEIGHTS': (10.0, 10.0, 5.0, 5.0), 'BODY_UV_ON': True, 'CLS_AGNOSTIC_BBOX_REG': False, 'CONV_BODY': 'FPN.add_fpn_ResNet50_conv5_body', 'EXECUTION_TYPE': 'dag', 'FASTER_RCNN': True, 'KEYPOINTS_ON': False, 'MASK_ON': False, 'NUM_CLASSES': 2, 'RPN_ONLY': False, 'TYPE': 'generalized_rcnn'}, 'MRCNN': {'CLS_SPECIFIC_MASK': True, 'CONV_INIT': 'GaussianFill', 'DILATION': 2, 'DIM_REDUCED': 256, 'RESOLUTION': 14, 'ROI_MASK_HEAD': '', 'ROI_XFORM_METHOD': 'RoIAlign', 'ROI_XFORM_RESOLUTION': 7, 'ROI_XFORM_SAMPLING_RATIO': 0, 'THRESH_BINARIZE': 0.5, 'UPSAMPLE_RATIO': 1, 'USE_FC_OUTPUT': False, 'WEIGHT_LOSS_MASK': 1.0}, 'NUM_GPUS': 1, 'OUTPUT_DIR': '/tmp/detectron-output', 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]), 'RESNETS': {'NUM_GROUPS': 1, 'RES5_DILATION': 1, 'SHORTCUT_FUNC': 'basic_bn_shortcut', 'STEM_FUNC': 'basic_bn_stem', 'STRIDE_1X1': True, 'TRANS_FUNC': 'bottleneck_transformation', 'WIDTH_PER_GROUP': 64}, 'RETINANET': {'ANCHOR_SCALE': 4, 'ASPECT_RATIOS': (0.5, 1.0, 2.0), 'BBOX_REG_BETA': 0.11, 'BBOX_REG_WEIGHT': 1.0, 'CLASS_SPECIFIC_BBOX': False, 'INFERENCE_TH': 0.05, 'LOSS_ALPHA': 0.25, 'LOSS_GAMMA': 2.0, 'NEGATIVE_OVERLAP': 0.4, 'NUM_CONVS': 4, 'POSITIVE_OVERLAP': 0.5, 'PRE_NMS_TOP_N': 1000, 'PRIOR_PROB': 0.01, 'RETINANET_ON': False, 'SCALES_PER_OCTAVE': 3, 'SHARE_CLS_BBOX_TOWER': False, 'SOFTMAX': False}, 'RFCN': {'PS_GRID_SIZE': 3}, 'RNG_SEED': 3, 'ROOT_DIR': '/media/Data/wangxiaoliang/densepose/densepose', 'RPN': {'ASPECT_RATIOS': (0.5, 1, 2), 'RPN_ON': True, 'SIZES': (64, 128, 256, 512), 'STRIDE': 16}, 'SOLVER': {'BASE_LR': 0.002, 'GAMMA': 0.1, 'LOG_LR_CHANGE_THRESHOLD': 1.1, 'LRS': [], 'LR_POLICY': 'steps_with_decay', 'MAX_ITER': 130000, 'MOMENTUM': 0.9, 'SCALE_MOMENTUM': True, 'SCALE_MOMENTUM_THRESHOLD': 1.1, 'STEPS': [0, 100000, 120000], 'STEP_SIZE': 30000, 'WARM_UP_FACTOR': 0.1, 'WARM_UP_ITERS': 1000, 'WARM_UP_METHOD': u'linear', 'WEIGHT_DECAY': 0.0001, 'WEIGHT_DECAY_GN': 0.0}, 'TEST': {'BBOX_AUG': {'AREA_TH_HI': 32400, 'AREA_TH_LO': 2500, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'COORD_HEUR': 'UNION', 'ENABLED': False, 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False, 'SCORE_HEUR': 'UNION'}, 'BBOX_REG': True, 'BBOX_VOTE': {'ENABLED': False, 'SCORING_METHOD': 'ID', 'SCORING_METHOD_BETA': 1.0, 'VOTE_TH': 0.8}, 'COMPETITION_MODE': True, 'DATASETS': ('dense_coco_2014_minival',), 'DETECTIONS_PER_IM': 20, 'FORCE_JSON_DATASET_EVAL': True, 'KPS_AUG': {'AREA_TH': 32400, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'ENABLED': False, 'HEUR': 'HM_AVG', 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False}, 'MASK_AUG': {'AREA_TH': 32400, 'ASPECT_RATIOS': (), 'ASPECT_RATIO_H_FLIP': False, 'ENABLED': False, 'HEUR': 'SOFT_AVG', 'H_FLIP': False, 'MAX_SIZE': 4000, 'SCALES': (), 'SCALE_H_FLIP': False, 'SCALE_SIZE_DEP': False}, 'MAX_SIZE': 1333, 'NMS': 0.5, 'PRECOMPUTED_PROPOSALS': False, 'PROPOSAL_FILES': (), 'PROPOSAL_LIMIT': 1000, 'RPN_MIN_SIZE': 0, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 1000, 'RPN_PRE_NMS_TOP_N': 1000, 'SCALE': 800, 'SCORE_THRESH': 0.05, 'SOFT_NMS': {'ENABLED': False, 'METHOD': 'linear', 'SIGMA': 0.5}, 'WEIGHTS': ''}, 'TRAIN': {'ASPECT_GROUPING': True, 'AUTO_RESUME': True, 'BATCH_SIZE_PER_IM': 32, 'BBOX_THRESH': 0.5, 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'CROWD_FILTER_THRESH': 0.7, 'DATASETS': ('dense_coco_2014_train', 'dense_coco_2014_valminusminival'), 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'FREEZE_CONV_BODY': False, 'GT_MIN_AREA': -1, 'IMS_PER_BATCH': 1, 'MAX_SIZE': 1333, 'PROPOSAL_FILES': (), 'RPN_BATCH_SIZE_PER_IM': 256, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 0, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 2000, 'RPN_STRADDLE_THRESH': 0, 'SCALES': (640, 672, 704, 736, 768, 800), 'SNAPSHOT_ITERS': 20000, 'USE_FLIPPED': True, 'WEIGHTS': '/tmp/detectron-download-cache/R-50.pkl'}, 'USE_NCCL': False, 'VIS': False, 'VIS_TH': 0.9}

liangwx commented 5 years ago

Sorry to disturb you. I just forget to change the 196 in file"detectron/ops/pool_points_interp.cu". After do that it works well!

mzk912778163 commented 4 years ago

hi , what tools or softwares did you use to label your custom dataset?is it convenient to share here?