win10专业版不能使用paddlepaddle-gpu2.0.0rc1训练

chuanfuye commented 3 years ago

你好，首先创造了conda环境，python解释器为3.7.9，cuda10.2，cudnn7.6.5，之后使用 python -m pip install paddlepaddle-gpu==2.0.0rc1 -f https://paddlepaddle.org.cn/whl/stable.html 和安装paddleocr里面的requirements，，测试如下： `>>> import paddle

paddle.utils.run_check() Running verify PaddlePaddle program ... W0126 18:07:13.143615 31828 device_context.cc:320] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.2 W0126 18:07:13.152613 31828 device_context.cc:330] device: 0, cuDNN Version: 7.6. PaddlePaddle works well on 1 GPU. W0126 18:07:16.054320 31828 build_strategy.cc:171] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU. PaddlePaddle works well on 1 GPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.` 推理图片没有问题 python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer/" --rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer/" --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" --use_angle_cls=True --use_space_char=True 但是训练文本检测模型 ’python3 tools/train.py -c configs/det/det_mv3_db.yml \ -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/‘ 出现如下错误： ‘[2021/01/27 09:01:08] root INFO: Architecture : [2021/01/27 09:01:08] root INFO: Backbone : [2021/01/27 09:01:08] root INFO: model_name : large [2021/01/27 09:01:08] root INFO: name : MobileNetV3 [2021/01/27 09:01:08] root INFO: scale : 0.5 [2021/01/27 09:01:08] root INFO: Head : [2021/01/27 09:01:08] root INFO: k : 50 [2021/01/27 09:01:08] root INFO: name : DBHead [2021/01/27 09:01:08] root INFO: Neck : [2021/01/27 09:01:08] root INFO: name : DBFPN [2021/01/27 09:01:08] root INFO: out_channels : 256 [2021/01/27 09:01:08] root INFO: Transform : None [2021/01/27 09:01:08] root INFO: algorithm : DB [2021/01/27 09:01:08] root INFO: model_type : det [2021/01/27 09:01:08] root INFO: Eval : [2021/01/27 09:01:08] root INFO: dataset : [2021/01/27 09:01:08] root INFO: data_dir : ./train_data/icdar2015/ [2021/01/27 09:01:08] root INFO: label_file_list : ['./train_data/icdar2015/test_icdar2015_label.txt'] [2021/01/27 09:01:08] root INFO: name : SimpleDataSet [2021/01/27 09:01:08] root INFO: transforms : [2021/01/27 09:01:08] root INFO: DecodeImage : [2021/01/27 09:01:08] root INFO: channel_first : False [2021/01/27 09:01:08] root INFO: img_mode : BGR [2021/01/27 09:01:08] root INFO: DetLabelEncode : None [2021/01/27 09:01:08] root INFO: DetResizeForTest : [2021/01/27 09:01:08] root INFO: image_shape : [736, 1280] [2021/01/27 09:01:08] root INFO: NormalizeImage : [2021/01/27 09:01:08] root INFO: mean : [0.485, 0.456, 0.406] [2021/01/27 09:01:08] root INFO: order : hwc [2021/01/27 09:01:08] root INFO: scale : 1./255. [2021/01/27 09:01:08] root INFO: std : [0.229, 0.224, 0.225] [2021/01/27 09:01:08] root INFO: ToCHWImage : None [2021/01/27 09:01:08] root INFO: KeepKeys : [2021/01/27 09:01:08] root INFO: keep_keys : ['image', 'shape', 'polys', 'ignore_tags'] [2021/01/27 09:01:08] root INFO: loader : [2021/01/27 09:01:08] root INFO: batch_size_per_card : 1 [2021/01/27 09:01:08] root INFO: drop_last : False [2021/01/27 09:01:08] root INFO: num_workers : 8 [2021/01/27 09:01:08] root INFO: shuffle : False [2021/01/27 09:01:08] root INFO: use_shared_memory : False [2021/01/27 09:01:08] root INFO: Global : [2021/01/27 09:01:08] root INFO: cal_metric_during_train : False [2021/01/27 09:01:08] root INFO: checkpoints : None [2021/01/27 09:01:08] root INFO: debug : False [2021/01/27 09:01:08] root INFO: distributed : False [2021/01/27 09:01:08] root INFO: epoch_num : 1200 [2021/01/27 09:01:08] root INFO: eval_batch_step : [0, 2000] [2021/01/27 09:01:08] root INFO: infer_img : doc/imgs_en/img_10.jpg [2021/01/27 09:01:08] root INFO: load_static_weights : True [2021/01/27 09:01:08] root INFO: log_smooth_window : 20 [2021/01/27 09:01:08] root INFO: pretrain_weights : ./pretrain_models/MobileNetV3_large_x0_5_pretrained/ [2021/01/27 09:01:08] root INFO: pretrained_model : ./pretrain_models/MobileNetV3_large_x0_5_pretrained [2021/01/27 09:01:08] root INFO: print_batch_step : 10 [2021/01/27 09:01:08] root INFO: save_epoch_step : 1200 [2021/01/27 09:01:08] root INFO: save_inference_dir : None [2021/01/27 09:01:08] root INFO: save_model_dir : ./output/db_mv3/ [2021/01/27 09:01:08] root INFO: save_res_path : ./output/det_db/predicts_db.txt [2021/01/27 09:01:08] root INFO: use_gpu : True [2021/01/27 09:01:08] root INFO: use_visualdl : False [2021/01/27 09:01:08] root INFO: Loss : [2021/01/27 09:01:08] root INFO: alpha : 5 [2021/01/27 09:01:08] root INFO: balance_loss : True [2021/01/27 09:01:08] root INFO: beta : 10 [2021/01/27 09:01:08] root INFO: main_loss_type : DiceLoss [2021/01/27 09:01:08] root INFO: name : DBLoss [2021/01/27 09:01:08] root INFO: ohem_ratio : 3 [2021/01/27 09:01:08] root INFO: Metric : [2021/01/27 09:01:08] root INFO: main_indicator : hmean [2021/01/27 09:01:08] root INFO: name : DetMetric [2021/01/27 09:01:08] root INFO: Optimizer : [2021/01/27 09:01:08] root INFO: beta1 : 0.9 [2021/01/27 09:01:08] root INFO: beta2 : 0.999 [2021/01/27 09:01:08] root INFO: lr : [2021/01/27 09:01:08] root INFO: learning_rate : 0.001 [2021/01/27 09:01:08] root INFO: name : Adam [2021/01/27 09:01:08] root INFO: regularizer : [2021/01/27 09:01:08] root INFO: factor : 0 [2021/01/27 09:01:08] root INFO: name : L2 [2021/01/27 09:01:08] root INFO: PostProcess : [2021/01/27 09:01:08] root INFO: box_thresh : 0.6 [2021/01/27 09:01:08] root INFO: max_candidates : 1000 [2021/01/27 09:01:08] root INFO: name : DBPostProcess [2021/01/27 09:01:08] root INFO: thresh : 0.3 [2021/01/27 09:01:08] root INFO: unclip_ratio : 1.5 [2021/01/27 09:01:08] root INFO: Train : [2021/01/27 09:01:08] root INFO: dataset : [2021/01/27 09:01:08] root INFO: data_dir : ./train_data/icdar2015/ [2021/01/27 09:01:08] root INFO: label_file_list : ['./train_data/icdar2015/train_icdar2015_label.txt'] [2021/01/27 09:01:08] root INFO: name : SimpleDataSet [2021/01/27 09:01:08] root INFO: ratio_list : [1.0] [2021/01/27 09:01:08] root INFO: transforms : [2021/01/27 09:01:08] root INFO: DecodeImage : [2021/01/27 09:01:08] root INFO: channel_first : False [2021/01/27 09:01:08] root INFO: img_mode : BGR [2021/01/27 09:01:08] root INFO: DetLabelEncode : None [2021/01/27 09:01:08] root INFO: IaaAugment : [2021/01/27 09:01:08] root INFO: augmenter_args : [2021/01/27 09:01:08] root INFO: args : [2021/01/27 09:01:08] root INFO: p : 0.5 [2021/01/27 09:01:08] root INFO: type : Fliplr [2021/01/27 09:01:08] root INFO: args : [2021/01/27 09:01:08] root INFO: rotate : [-10, 10] [2021/01/27 09:01:08] root INFO: type : Affine [2021/01/27 09:01:08] root INFO: args : [2021/01/27 09:01:08] root INFO: size : [0.5, 3] [2021/01/27 09:01:08] root INFO: type : Resize [2021/01/27 09:01:08] root INFO: EastRandomCropData : [2021/01/27 09:01:08] root INFO: keep_ratio : True [2021/01/27 09:01:08] root INFO: max_tries : 50 [2021/01/27 09:01:08] root INFO: size : [640, 640] [2021/01/27 09:01:08] root INFO: MakeBorderMap : [2021/01/27 09:01:08] root INFO: shrink_ratio : 0.4 [2021/01/27 09:01:08] root INFO: thresh_max : 0.7 [2021/01/27 09:01:08] root INFO: thresh_min : 0.3 [2021/01/27 09:01:08] root INFO: MakeShrinkMap : [2021/01/27 09:01:08] root INFO: min_text_size : 8 [2021/01/27 09:01:08] root INFO: shrink_ratio : 0.4 [2021/01/27 09:01:08] root INFO: NormalizeImage : [2021/01/27 09:01:08] root INFO: mean : [0.485, 0.456, 0.406] [2021/01/27 09:01:08] root INFO: order : hwc [2021/01/27 09:01:08] root INFO: scale : 1./255. [2021/01/27 09:01:08] root INFO: std : [0.229, 0.224, 0.225] [2021/01/27 09:01:08] root INFO: ToCHWImage : None [2021/01/27 09:01:08] root INFO: KeepKeys : [2021/01/27 09:01:08] root INFO: keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] [2021/01/27 09:01:08] root INFO: loader : [2021/01/27 09:01:08] root INFO: batch_size_per_card : 16 [2021/01/27 09:01:08] root INFO: drop_last : False [2021/01/27 09:01:08] root INFO: num_workers : 8 [2021/01/27 09:01:08] root INFO: shuffle : True [2021/01/27 09:01:08] root INFO: use_shared_memory : False [2021/01/27 09:01:08] root INFO: train with paddle 2.0.0-rc1 and device CUDAPlace(0) [2021/01/27 09:01:08] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/train_icdar2015_label.txt'] [2021/01/27 09:01:08] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/test_icdar2015_label.txt'] W0127 09:01:08.837458 43356 device_context.cc:320] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.1, Runtime API Version: 10.2 W0127 09:01:08.847458 43356 device_context.cc:330] device: 0, cuDNN Version: 7.6. [2021/01/27 09:01:12] root INFO: load pretrained model from ['./pretrain_models/MobileNetV3_large_x0_5_pretrained'] [2021/01/27 09:01:12] root INFO: train dataloader has 63 iters, valid dataloader has 500 iters [2021/01/27 09:01:12] root INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations [2021/01/27 09:01:12] root INFO: Initialize indexs of datasets:['./train_data/icdar2015/train_icdar2015_label.txt'] Segmentation fault’ 请问要怎么解决才能消除这个错误，期待你的回答，非常感谢

TeslaZhao commented 3 years ago

您好，感谢您的反馈！日志看不出问题，您能启动之前加一下GLOG_v=3启动吗，充分打一下日志

chuanfuye commented 3 years ago

您好，感谢您的反馈！日志看不到问题，您能启动之前加一下GLOG_v = 3启动吗，充分打一下日志

您好，请问一下您说的这个GLOG_v=3 的代码是在哪里改呢，我修改了python里面的loggering模块level（从INFO到CRITICAL）输出的日志信息都是和之前一样，到Segmentation fault’就结束了，您说的是C++的训练还是？？ ps：GPU训练启动命令 python tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrain_weights=./pretrain_models/MobileNetV3_large_x0_5_pretrained/ 测试没有问题.

chuanfuye commented 3 years ago

您好，感谢您的反馈！日志看不出问题，您能启动之前加一下GLOG_v=3启动吗，充分打一下日志

你好，问题找到了。在配置config文件那边batchbatch_size_per_card设置小一点，就能训练了（正对1660TI），谢谢

PaddlePaddle / PaddleOCR

win10专业版不能使用paddlepaddle-gpu2.0.0rc1训练 #1838