Closed lurenlym closed 3 years ago
YOLO系列本身支持数据gt为空的情况。错误不像是代码改动导致的。但错误信息不够详细,目前还没看出什么原因。不知道还有没有其他信息,比如单卡能否跑起来?其他的log等。或 export GLOG_vmoudle=operator=4
看下有没有其他log。
YOLO系列本身支持数据gt为空的情况。错误不像是代码改动导致的。但错误信息不够详细,目前还没看出什么原因。不知道还有没有其他信息,比如单卡能否跑起来?其他的log等。或
export GLOG_vmoudle=operator=4
看下有没有其他log。
单卡也不行,完全不知道该从哪判断,找不到其他log, export GLOG_vmoudle=operator=4运行后也没有其他log。自己的数据集拿faster系列能跑起来
训练faster_rcnn_dcn_x101_vd_64x4d_fpn_1x和efficentnet都没问题,但是ppyolo不行,CBResNet200-vd-FPN-Nonlocal也不行
CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/dcn/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms_grddc.yml --eval 2020-09-24 14:22:16,925-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters! 2020-09-24 14:22:26,767-INFO: places would be ommited when DataLoader is not iterable W0924 14:22:26.894659 9687 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.2, Runtime API Version: 10.0 W0924 14:22:26.897580 9687 device_context.cc:260] device: 0, cuDNN Version: 8.0. 2020-09-24 14:22:29,009-WARNING: /root/.cache/paddle/weights/ResNet200_vd_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] /opt/conda/envs/paddle/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.w_0 fc_0.b_0 format(" ".join(unused_para_list))) 2020-09-24 14:22:30,154-INFO: places would be ommited when DataLoader is not iterable 2020-09-24 14:22:33,640-INFO: iter: 0, lr: 0.001000, 'loss_cls_0': '1.689152', 'loss_loc_0': '0.000000', 'loss_cls_1': '0.870103', 'loss_loc_1': '0.000002', 'loss_cls_2': '0.379178', 'loss_loc_2': '0.000001', 'loss_rpn_cls': '0.687545', 'loss_rpn_bbox': '0.073837', 'loss': '3.699818', time: 0.000, eta: 0:00:00 W0924 14:22:34.125052 9844 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly W0924 14:22:34.125079 9844 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0924 14:22:34.125083 9844 init.cc:231] The detail failure signal is:
W0924 14:22:34.125087 9844 init.cc:234] Aborted at 1600957354 (unix time) try "date -d @1600957354" if you are using GNU date W0924 14:22:34.126740 9844 init.cc:234] PC: @ 0x0 (unknown) W0924 14:22:34.126876 9844 init.cc:234] SIGSEGV (@0x0) received by PID 9687 (TID 0x7f7e6e364700) from PID 0; stack trace: W0924 14:22:34.128216 9844 init.cc:234] @ 0x7f7ef25098a0 (unknown) W0924 14:22:34.129441 9844 init.cc:234] @ 0x0 (unknown)
config architecture: CascadeRCNNClsAware max_iters: 10000 snapshot_iter: 1000 use_gpu: true log_smooth_window: 20 save_dir: output pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar weights: output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/model_final metric: VOC num_classes: 5
CascadeRCNNClsAware: backbone: ResNet fpn: FPN rpn_head: FPNRPNHead roi_extractor: FPNRoIAlign bbox_head: CascadeBBoxHead bbox_assigner: CascadeBBoxAssigner
ResNet: norm_type: bn depth: 200 feature_maps: [2, 3, 4, 5] freeze_at: 2 variant: d dcn_v2_stages: [3, 4, 5] nonlocal_stages: [4]
FPN: min_level: 2 max_level: 6 num_chan: 256 spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
FPNRPNHead: anchor_generator: anchor_sizes: [32, 64, 128, 256, 512] aspect_ratios: [0.5, 1.0, 2.0] stride: [16.0, 16.0] variance: [1.0, 1.0, 1.0, 1.0] anchor_start_size: 32 min_level: 2 max_level: 6 num_chan: 256 rpn_target_assign: rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_positive_overlap: 0.7 rpn_negative_overlap: 0.3 rpn_straddle_thresh: 0.0 train_proposal: min_size: 0.0 nms_thresh: 0.7 pre_nms_top_n: 2000 post_nms_top_n: 2000 test_proposal: min_size: 0.0 nms_thresh: 0.7 pre_nms_top_n: 1000 post_nms_top_n: 1000
FPNRoIAlign: canconical_level: 4 canonical_size: 224 min_level: 2 max_level: 5 box_resolution: 14 sampling_ratio: 2
CascadeBBoxAssigner: batch_size_per_im: 512 bbox_reg_weights: [10, 20, 30] bg_thresh_lo: [0.0, 0.0, 0.0] bg_thresh_hi: [0.5, 0.6, 0.7] fg_thresh: [0.5, 0.6, 0.7] fg_fraction: 0.25 class_aware: True
CascadeBBoxHead: head: CascadeTwoFCHead nms: MultiClassSoftNMS
CascadeTwoFCHead: mlp_dim: 1024
MultiClassSoftNMS: score_threshold: 0.01 keep_top_k: 300 softnms_sigma: 0.5
LearningRate: base_lr: 0.01 schedulers:
OptimizerBuilder: optimizer: momentum: 0.9 type: Momentum regularizer: factor: 0.0001 type: L2
TrainReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'gt_bbox', 'gt_class', 'is_crowd'] dataset: !VOCDataSet dataset_dir: dataset/voc anno_path: trainval1.txt use_default_label: false sample_transforms:
EvalReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] dataset: !VOCDataSet dataset_dir: dataset/voc anno_path: test.txt use_default_label: false sample_transforms:
TestReader: inputs_def: fields: ['image', 'im_info', 'im_id', 'im_shape'] dataset: !ImageFolder anno_path: dataset/voc/label_list.txt use_default_label: false
sample_transforms:
该怎么找问题啊,没有详细log,debug一般是run处出错
@lurenlym ,你试试将数据增强中的Mixup方法删除,好像就可以 我碰到同样的问题,删除之后,就可以了,但是不知道为什么?
ubuntu18.04 release0.4最新 GPU V100 自己数据集,四类
运行命令 CUDA_VISIBLE_DEVICES=1,2 python tools/train.py -c configs/ppyolo/ppyolo_grddc.yml --eval
错误
此外数据集存在空gt的情况,存在没有difficult标签的情况,因此更改如下文件:
PaddleDetection/ppdet/data/source/voc.py line145
PaddleDetection/ppdet/data/transform/operators.py Line1272