daodaofr / AlignPS

Code for CVPR 2021 paper: Anchor-Free Person Search
Apache License 2.0
167 stars 34 forks source link

When I implementatiaon ROI-AlignPS, I have a problem #24

Closed liudapeng2333 closed 3 years ago

liudapeng2333 commented 3 years ago

(open-mmlab) lxz@lxz-System-Product-Name:~/KunPeng_Liu/AlignPS$ /bin/bash /home/lxz/KunPeng_Liu/AlignPS/run_train.sh fatal: not a git repository (or any of the parent directories): .git 2021-09-13 17:36:13,497 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda-11.0 NVCC: Build cuda_11.0_bu.TC445_37.28845127_0 GPU 0: NVIDIA GeForce RTX 3090 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.8.0 OpenCV: 4.5.3 MMCV: 1.3.13 MMDetection: 2.4.0+ MMDetection Compiler: GCC 7.3 MMDetection CUDA Compiler: 11.0

2021-09-13 17:36:14,238 - mmdet - INFO - Distributed training: False 2021-09-13 17:36:14,971 - mmdet - INFO - Config: dataset_type = 'CuhkDataset' data_root = '/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/' img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Resize', img_scale=[(667, 400), (1000, 600), (1333, 800), (1500, 900), (1666, 1000), (2000, 1200)], multiscale_mode='value', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_ids']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1500, 900), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=5, workers_per_gpu=5, train=dict( type='CuhkDataset', ann_file= '/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/train_pid.json', img_prefix='/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/frames/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Resize', img_scale=[(667, 400), (1000, 600), (1333, 800), (1500, 900), (1666, 1000), (2000, 1200)], multiscale_mode='value', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_ids']) ]), val=dict( type='CuhkDataset', ann_file= '/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/test_pid.json', img_prefix='/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/frames/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1500, 900), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CuhkDataset', ann_file= '/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/test_pid.json', img_prefix='/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/frames/', proposal_file= '/home/lxz/KunPeng_Liu/dataset/PRW/PRW-v16.04.20/annotation/test/train_test/TestG50.mat', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1500, 900), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') norm_cfg = dict(type='BN', requires_grad=False) model = dict( type='SingleTwoStageDetector176PRW', pretrained='open-mmlab://detectron2/resnet50_caffe', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='caffe'), rpn_head=dict( type='RPNHead', in_channels=1024, feat_channels=1024, anchor_generator=dict( type='AnchorGenerator', scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='PersonSearchRoIHead2Input1', shared_head=dict( type='ResLayer', depth=50, stage=3, stride=2, dilation=1, style='caffe', norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=14, sample_num=0), out_channels=1024, featmap_strides=[16]), bbox_head=dict( type='PersonSearchNormAwareNewoim2InputBNBBoxHeadPRW', with_avg_pool=True, roi_feat_size=7, in_channels=2048, num_classes=1, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=10.0))), neck=dict( type='FPNDcnLconv3Dcn', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs=True, extra_convs_on_inputs=False, num_outs=5, relu_before_extra_convs=True), bbox_head=dict( type='FCOSReidHeadFocalSubTriQueue3PRW', num_classes=1, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=True, center_sampling=True, conv_bias=True)) train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=12000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.1, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=128, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=6000, nms_post=300, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100), nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100) optimizer = dict(type='SGD', lr=0.0015, momentum=0.9, weight_decay=0.0005) optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=1141, warmup_ratio=0.005, step=[16, 22]) total_epochs = 24 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/faster_rcnn_r50_caffe_c4_1x_cuhk_single_two_stage17_6_nae1_prw' gpu_ids = [0]

/home/lxz/.local/lib/python3.7/site-packages/mmcv/utils/misc.py:324: UserWarning: "out_size" is deprecated in RoIAlign.__init__, please use "output_size" instead f'"{src_arg_name}" is deprecated in ' /home/lxz/.local/lib/python3.7/site-packages/mmcv/utils/misc.py:324: UserWarning: "sample_num" is deprecated in RoIAlign.__init__, please use "sampling_ratio" instead f'"{src_arg_name}" is deprecated in ' 2021-09-13 17:36:15,307 - mmdet - INFO - load model from: open-mmlab://detectron2/resnet50_caffe 2021-09-13 17:36:15,308 - mmdet - INFO - Use load_from_openmmlab loader 2021-09-13 17:36:15,367 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.bias

2021-09-13 17:36:15,432 - mmdet - INFO - Use load_from_openmmlab loader 2021-09-13 17:36:15,477 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.weight, conv1.bias, bn1.bias, bn1.weight, bn1.running_mean, bn1.running_var, layer1.0.downsample.0.weight, layer1.0.downsample.1.bias, layer1.0.downsample.1.weight, layer1.0.downsample.1.running_mean, layer1.0.downsample.1.running_var, layer1.0.conv1.weight, layer1.0.bn1.bias, layer1.0.bn1.weight, layer1.0.bn1.running_mean, layer1.0.bn1.running_var, layer1.0.conv2.weight, layer1.0.bn2.bias, layer1.0.bn2.weight, layer1.0.bn2.running_mean, layer1.0.bn2.running_var, layer1.0.conv3.weight, layer1.0.bn3.bias, layer1.0.bn3.weight, layer1.0.bn3.running_mean, layer1.0.bn3.running_var, layer1.1.conv1.weight, layer1.1.bn1.bias, layer1.1.bn1.weight, layer1.1.bn1.running_mean, layer1.1.bn1.running_var, layer1.1.conv2.weight, layer1.1.bn2.bias, layer1.1.bn2.weight, layer1.1.bn2.running_mean, layer1.1.bn2.running_var, layer1.1.conv3.weight, layer1.1.bn3.bias, layer1.1.bn3.weight, layer1.1.bn3.running_mean, layer1.1.bn3.running_var, layer1.2.conv1.weight, layer1.2.bn1.bias, layer1.2.bn1.weight, layer1.2.bn1.running_mean, layer1.2.bn1.running_var, layer1.2.conv2.weight, layer1.2.bn2.bias, layer1.2.bn2.weight, layer1.2.bn2.running_mean, layer1.2.bn2.running_var, layer1.2.conv3.weight, layer1.2.bn3.bias, layer1.2.bn3.weight, layer1.2.bn3.running_mean, layer1.2.bn3.running_var, layer2.0.downsample.0.weight, layer2.0.downsample.1.bias, layer2.0.downsample.1.weight, layer2.0.downsample.1.running_mean, layer2.0.downsample.1.running_var, layer2.0.conv1.weight, layer2.0.bn1.bias, layer2.0.bn1.weight, layer2.0.bn1.running_mean, layer2.0.bn1.running_var, layer2.0.conv2.weight, layer2.0.bn2.bias, layer2.0.bn2.weight, layer2.0.bn2.running_mean, layer2.0.bn2.running_var, layer2.0.conv3.weight, layer2.0.bn3.bias, layer2.0.bn3.weight, layer2.0.bn3.running_mean, layer2.0.bn3.running_var, layer2.1.conv1.weight, layer2.1.bn1.bias, layer2.1.bn1.weight, layer2.1.bn1.running_mean, layer2.1.bn1.running_var, layer2.1.conv2.weight, layer2.1.bn2.bias, layer2.1.bn2.weight, layer2.1.bn2.running_mean, layer2.1.bn2.running_var, layer2.1.conv3.weight, layer2.1.bn3.bias, layer2.1.bn3.weight, layer2.1.bn3.running_mean, layer2.1.bn3.running_var, layer2.2.conv1.weight, layer2.2.bn1.bias, layer2.2.bn1.weight, layer2.2.bn1.running_mean, layer2.2.bn1.running_var, layer2.2.conv2.weight, layer2.2.bn2.bias, layer2.2.bn2.weight, layer2.2.bn2.running_mean, layer2.2.bn2.running_var, layer2.2.conv3.weight, layer2.2.bn3.bias, layer2.2.bn3.weight, layer2.2.bn3.running_mean, layer2.2.bn3.running_var, layer2.3.conv1.weight, layer2.3.bn1.bias, layer2.3.bn1.weight, layer2.3.bn1.running_mean, layer2.3.bn1.running_var, layer2.3.conv2.weight, layer2.3.bn2.bias, layer2.3.bn2.weight, layer2.3.bn2.running_mean, layer2.3.bn2.running_var, layer2.3.conv3.weight, layer2.3.bn3.bias, layer2.3.bn3.weight, layer2.3.bn3.running_mean, layer2.3.bn3.running_var, layer3.0.downsample.0.weight, layer3.0.downsample.1.bias, layer3.0.downsample.1.weight, layer3.0.downsample.1.running_mean, layer3.0.downsample.1.running_var, layer3.0.conv1.weight, layer3.0.bn1.bias, layer3.0.bn1.weight, layer3.0.bn1.running_mean, layer3.0.bn1.running_var, layer3.0.conv2.weight, layer3.0.bn2.bias, layer3.0.bn2.weight, layer3.0.bn2.running_mean, layer3.0.bn2.running_var, layer3.0.conv3.weight, layer3.0.bn3.bias, layer3.0.bn3.weight, layer3.0.bn3.running_mean, layer3.0.bn3.running_var, layer3.1.conv1.weight, layer3.1.bn1.bias, layer3.1.bn1.weight, layer3.1.bn1.running_mean, layer3.1.bn1.running_var, layer3.1.conv2.weight, layer3.1.bn2.bias, layer3.1.bn2.weight, layer3.1.bn2.running_mean, layer3.1.bn2.running_var, layer3.1.conv3.weight, layer3.1.bn3.bias, layer3.1.bn3.weight, layer3.1.bn3.running_mean, layer3.1.bn3.running_var, layer3.2.conv1.weight, layer3.2.bn1.bias, layer3.2.bn1.weight, layer3.2.bn1.running_mean, layer3.2.bn1.running_var, layer3.2.conv2.weight, layer3.2.bn2.bias, layer3.2.bn2.weight, layer3.2.bn2.running_mean, layer3.2.bn2.running_var, layer3.2.conv3.weight, layer3.2.bn3.bias, layer3.2.bn3.weight, layer3.2.bn3.running_mean, layer3.2.bn3.running_var, layer3.3.conv1.weight, layer3.3.bn1.bias, layer3.3.bn1.weight, layer3.3.bn1.running_mean, layer3.3.bn1.running_var, layer3.3.conv2.weight, layer3.3.bn2.bias, layer3.3.bn2.weight, layer3.3.bn2.running_mean, layer3.3.bn2.running_var, layer3.3.conv3.weight, layer3.3.bn3.bias, layer3.3.bn3.weight, layer3.3.bn3.running_mean, layer3.3.bn3.running_var, layer3.4.conv1.weight, layer3.4.bn1.bias, layer3.4.bn1.weight, layer3.4.bn1.running_mean, layer3.4.bn1.running_var, layer3.4.conv2.weight, layer3.4.bn2.bias, layer3.4.bn2.weight, layer3.4.bn2.running_mean, layer3.4.bn2.running_var, layer3.4.conv3.weight, layer3.4.bn3.bias, layer3.4.bn3.weight, layer3.4.bn3.running_mean, layer3.4.bn3.running_var, layer3.5.conv1.weight, layer3.5.bn1.bias, layer3.5.bn1.weight, layer3.5.bn1.running_mean, layer3.5.bn1.running_var, layer3.5.conv2.weight, layer3.5.bn2.bias, layer3.5.bn2.weight, layer3.5.bn2.running_mean, layer3.5.bn2.running_var, layer3.5.conv3.weight, layer3.5.bn3.bias, layer3.5.bn3.weight, layer3.5.bn3.running_mean, layer3.5.bn3.running_var

loading annotations into memory... Done (t=0.03s) creating index... index created! fatal: not a git repository (or any of the parent directories): .git 2021-09-13 17:36:16,813 - mmdet - INFO - Start running, host: lxz@lxz-System-Product-Name, work_dir: /home/lxz/KunPeng_Liu/AlignPS/work_dirs/faster_rcnn_r50_caffe_c4_1x_cuhk_single_two_stage17_6_nae1_prw 2021-09-13 17:36:16,813 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(VERY_LOW ) TextLoggerHook


before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook


after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(VERY_LOW ) TextLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook


2021-09-13 17:36:16,813 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs /home/lxz/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:2952: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /home/lxz/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /home/lxz/KunPeng_Liu/AlignPS/mmdet/models/dense_heads/fcos_reid_head_focal_sub_triqueue3_prw.py:306: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603729047590/work/torch/csrc/utils/python_arg_parser.cpp:882.) & (flatten_labels < bg_class_ind)).nonzero().reshape(-1) /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/THCUNN/ClassNLLCriterion.cu:187: cunn_ClassNLLCriterion_updateGradInput_kernel: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "tools/train.py", line 177, in main() File "tools/train.py", line 173, in main meta=meta) File "/home/lxz/KunPeng_Liu/AlignPS/mmdet/apis/train.py", line 146, in train_detector runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/lxz/.local/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], *kwargs) File "/home/lxz/.local/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train self.call_hook('after_train_iter') File "/home/lxz/.local/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/lxz/.local/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 35, in after_train_iter runner.outputs['loss'].backward() File "/home/lxz/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/lxz/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag File "/home/lxz/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, args) # type: ignore File "/home/lxz/KunPeng_Liu/AlignPS/mmdet/models/roi_heads/bbox_heads/oim_nae_new.py", line 29, in backward if y >= 0: RuntimeError: CUDA error: device-side assert triggered

When i implementation AlignPS, the program can run sucessfully.

daodaofr commented 3 years ago

i will check

daodaofr commented 3 years ago

I tried on my local machine, the code runs well. Could please check the data path and the environment?

Here I tested: mmcv=1.2.0, pytorch = 1.7.0 cuda=10.2, torchvision=0.8.1

liudapeng2333 commented 3 years ago

i will try

liudapeng2333 commented 3 years ago

i dont change my environment, but i change the batchsize from 5 to 4 and the num_classes of Faster RCNN from 80 to 1, and it run successfully. thanks for your reply