训练检测模型时，加载不了预训练模型文件

xuyuhui666 commented 4 years ago

加载模型时候报以下错误： ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, 模型已经放在这个目录下了，配置文件det_mv3_db.yml里面也已经设置好模型路径了，不知道是哪地方设置错了

littletomatodonkey commented 4 years ago

你使用的是Paddle1.7以上的版本吗？可以提供更加详细的日志信息吗？

xuyuhui666 commented 4 years ago

版本不是很清楚，就是前天下载的

++++++++ 2020-07-31 20:58:22,520-INFO: places would be ommited when DataLoader is not iterable 2020-07-31 20:59:04,252-INFO: Loading parameters from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/... 2020-07-31 21:22:33,662-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-07-31 21:22:33,662-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] Traceback (most recent call last): File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\io.py", line 1865, in load_program_state filename=file_name) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\io.py", line 793, in load_vars executor.run(load_prog) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 790, in run six.reraise(*sys.exc_info()) File "C:\ProgramData\Miniconda3\lib\site-packages\six.py", line 693, in reraise raise value File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 785, in run use_program_cache=use_program_cache) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 838, in _run_impl use_program_cache=use_program_cache) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 912, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: ++++++++++ 上面是报的错误，加载模型时候在save_load.py文件下load_params(exe, prog, path, ignore_params=[])函数报错了，我看里面有个判断：if not (os.path.isdir(path) or os.path.exists(path + '.pdparams'))，不是很清楚为什么path是模型文件夹，为什么要加 '.pdparams'字符串呢？

xuyuhui666 commented 4 years ago

版本是1.7.2的，应该不是版本的问题

littletomatodonkey commented 4 years ago

你使用的是Paddle1.7以上的版本吗？可以提供更加详细的日志信息吗？我用的是paddlepaddle 1.8.3 paddle1.02 日志信息是这样的 `tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml 2020-07-31 18:16:36,862-INFO: {'Global': {'debug': True, 'algorithm': 'CRNN', 'use_gpu': False, 'epoch_num': 1024, 'log_smooth_window': 20, 'print_batch_step': 128, 'save_model_dir': './output/rec_CRNN', 'save_epoch_step': 256, 'eval_b atch_step': 500, 'train_batch_size_per_card': 256, 'test_batch_size_per_card': 256, 'image_shape': [3, 32, 320], 'max_text_length': 20, 'character_type': 'ch', 'loss_type': 'ctc', 'reader_yml': './configs/rec/rec_benchmark_reader.yml', 'pretrain_weights': './pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy', 'checkpoints': None, 'save_inference_dir': None, 'infer_img': None, 'character_dict_path': './ppocr/utils/new.txt', 'cpu_num': 8}, 'Architecture': {'functi on': 'ppocr.modeling.architectures.rec_model,RecModel'}, 'Backbone': {'function': 'ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3', 'scale': 0.5, 'model_name': 'large'}, 'Head': {'function': 'ppocr.modeling.heads.rec_ctc_head,CT CPredict', 'encoder_type': 'rnn', 'SeqRNN': {'hidden_size': 96}}, 'Loss': {'function': 'ppocr.modeling.losses.rec_ctc_loss,CTCLoss'}, 'Optimizer': {'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.005, 'beta1': 0.9, 'beta2': 0.999 , 'decay': {'function': 'cosine_decay', 'step_each_epoch': 24, 'total_epoch': 1024}}, 'TrainReader': {'reader_function': 'ppocr.data.rec.dataset_traversal,SimpleReader', 'num_workers': 8, 'img_set_dir': './train_data', 'label_file_path ': './train_data/train_rec_label.txt'}, 'EvalReader': {'reader_function': 'ppocr.data.rec.dataset_traversal,SimpleReader', 'img_set_dir': './train_data', 'label_file_path': './train_data/test_rec_label.txt'}, 'TestReader': {'reader_fun ction': 'ppocr.data.rec.dataset_traversal,SimpleReader'}} 2020-07-31 18:16:37,388-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000000] in Optimizer will not take effect, and it will only be applied to other Parameters! 2020-07-31 18:16:38,998-INFO: places would be ommited when DataLoader is not iterable Notice: now supported ops include [Conv, DepthwiseConv, FC(mul), BatchNorm, Pool, Activation(sigmoid, tanh, relu, leaky_relu, prelu)] 2020-07-31 18:16:39,245-INFO: Loading parameters from ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy... 2020-07-31 18:16:39,306-WARNING: variable ctc_fc_w_attr not used 2020-07-31 18:16:39,306-WARNING: variable ctc_fc_b_attr not used 2020-07-31 18:16:39,343-INFO: Finish initing model from ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy !!! The CPU_NUM is not specified, you should set CPU_NUM in the environment variable list. CPU_NUM indicates that how many CPUPlace are used in the current task. And if this parameter are set as N (equal to the number of physical CPU core) the program may be faster.

export CPU_NUM=8 # for example, set CPU_NUM as number of physical CPU core which is 8.

!!! The default number of CPU_NUM=1. W0731 18:16:39.384634 11400 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU. multiprocess is not fully compatible with Windows.num_workers will be 1. does not exist!:40,307-INFO: ./train_data/`

这个看着是加载成功了的

littletomatodonkey commented 4 years ago

版本不是很清楚，就是前天下载的

++++++++ 2020-07-31 20:58:22,520-INFO: places would be ommited when DataLoader is not iterable 2020-07-31 20:59:04,252-INFO: Loading parameters from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/... 2020-07-31 21:22:33,662-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-07-31 21:22:33,662-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] Traceback (most recent call last): File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\io.py", line 1865, in load_program_state filename=file_name) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\io.py", line 793, in load_vars executor.run(load_prog) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 790, in run six.reraise(*sys.exc_info()) File "C:\ProgramData\Miniconda3\lib\site-packages\six.py", line 693, in reraise raise value File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 785, in run use_program_cache=use_program_cache) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 838, in _run_impl use_program_cache=use_program_cache) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 912, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: ++++++++++ 上面是报的错误，加载模型时候在save_load.py文件下load_params(exe, prog, path, ignore_params=[])函数报错了，我看里面有个判断：if not (os.path.isdir(path) or os.path.exists(path + '.pdparams'))，不是很清楚为什么path是模型文件夹，为什么要加 '.pdparams'字符串呢？

因为在Paddle1.7以前，模型文件是按照零散文件来存储的，在1.7之后统一了save load接口，都会保存为pdparams文件，所以在这里pretrain只需提供预训练前缀（新版预训练模型）或者文件夹（零散文件形式的预训练模型）即可。

xuyuhui666 commented 4 years ago

可能上传的错误日志不全，感觉所有步骤都是按照官网一步步来的，解压预训练模型也没问题，batch也调小了，感觉是加载模型那出问题了 +++++++ 2020-08-01 14:32:15,644-INFO: {'Global': {'debug': False, 'algorithm': 'DB', 'use_gpu': True, 'epoch_num': 1200, 'log_smooth_window': 20, 'print_batch_step': 2, 'save_model_dir': './output/det_db/', 'save_epoch_step': 200, 'eval_batch_step': [4000, 5000], 'train_batch_size_per_card': 16, 'test_batch_size_per_card': 16, 'image_shape': [3, 640, 640], 'reader_yml': './configs/det/det_db_icdar15_reader.yml', 'pretrain_weights': './pretrain_models/MobileNetV3_large_x0_5_pretrained', 'checkpoints': None, 'save_res_path': './output/det_db/predicts_db.txt', 'save_inference_dir': None}, 'Architecture': {'function': 'ppocr.modeling.architectures.det_model,DetModel'}, 'Backbone': {'function': 'ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3', 'scale': 0.5, 'model_name': 'large'}, 'Head': {'function': 'ppocr.modeling.heads.det_db_head,DBHead', 'model_name': 'large', 'k': 50, 'inner_channels': 96, 'out_channels': 2}, 'Loss': {'function': 'ppocr.modeling.losses.det_db_loss,DBLoss', 'balance_loss': True, 'main_loss_type': 'DiceLoss', 'alpha': 5, 'beta': 10, 'ohem_ratio': 3}, 'Optimizer': {'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.001, 'beta1': 0.9, 'beta2': 0.999}, 'PostProcess': {'function': 'ppocr.postprocess.db_postprocess,DBPostProcess', 'thresh': 0.3, 'box_thresh': 0.7, 'max_candidates': 1000, 'unclip_ratio': 2.0}, 'TrainReader': {'reader_function': 'ppocr.data.det.dataset_traversal,TrainReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTrain', 'num_workers': 8, 'img_set_dir': '../ocr_data/train_full_img/', 'label_file_path': '../ocr_data/train.txt'}, 'EvalReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'img_set_dir': '../ocr_data/icdar2015_train_image/', 'label_file_path': '../ocr_data/icdar2015_label.txt', 'test_image_shape': [736, 1280]}, 'TestReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'infer_img': None, 'img_set_dir': '../ocr_data/icdar2015_train_image/', 'label_file_path': '../ocr_data/icdar2015_label.txt', 'test_image_shape': [736, 1280], 'do_eval': True}} 3 640 640 3 640 640 2020-08-01 14:32:18,793-INFO: places would be ommited when DataLoader is not iterable 2020-08-01 14:32:21,184-INFO: Loading parameters from ./pretrain_models/MobileNetV3_large_x0_5_pretrained... 2020-08-01 14:32:21,185-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-08-01 14:32:21,185-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ] 2020-08-01 14:32:21,971-INFO: Finish initing model from ./pretrain_models/MobileNetV3_large_x0_5_pretrained 2020-08-01 14:32:21,971-INFO: During the training process, after the 4000th iteration, an evaluation is run every 5000 iterations 2020-08-01 14:32:22,384-WARNING: Your reader has raised an exception! multiprocess is not fully compatible with Windows.num_workers will be 1. Traceback (most recent call last): File "E:/xuyuhui/PaddleOCR/tools/train.py", line 121, in main() File "E:/xuyuhui/PaddleOCR/tools/train.py", line 96, in main program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict) File "E:\xuyuhui\PaddleOCR\tools\program.py", line 250, in train_eval_det_run return_numpy=False) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 790, in run six.reraise(*sys.exc_info()) File "C:\ProgramData\Miniconda3\lib\site-packages\six.py", line 693, in reraise raise value File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 785, in run use_program_cache=use_program_cache) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 850, in _run_impl return_numpy=return_numpy) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\executor.py", line 684, in _run_parallel tensors = exe.run(fetch_var_names)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Python Call Stacks (More useful to users):

File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\reader.py", line 733, in _init_non_iterable outputs={'Out': self._feed_list}) File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\reader.py", line 646, in init self._init_non_iterable() File "C:\ProgramData\Miniconda3\lib\site-packages\paddle\fluid\reader.py", line 280, in from_generator iterable, return_list) File "E:\xuyuhui\PaddleOCR\ppocr\modeling\architectures\det_model.py", line 104, in create_feed iterable=False) File "E:\xuyuhui\PaddleOCR\ppocr\modeling\architectures\det_model.py", line 117, in call image, labels, loader = self.create_feed(mode) File "E:\xuyuhui\PaddleOCR\tools\program.py", line 175, in build dataloader, outputs = model(mode=mode) File "E:/xuyuhui/PaddleOCR/tools/train.py", line 50, in main config, train_program, startup_program, mode='train') File "E:/xuyuhui/PaddleOCR/tools/train.py", line 121, in main()

Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception [Hint: Expected killed != true, but received killed:1 == true:1.] at (D:\1.7.2\paddle\paddle/fluid/operators/reader/blocking_queue.h:141) [operator < read > error] W0801 14:32:19.696929 277288 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W0801 14:32:19.704908 277288 device_context.cc:245] device: 0, cuDNN Version: 7.4. I0801 14:32:22.009755 277288 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel. I0801 14:32:22.069594 277288 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1 I0801 14:32:22.231163 277288 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True I0801 14:32:22.269088 277288 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0 +++++++++ 不知道是哪出问题了？

littletomatodonkey commented 4 years ago

可以看下数据路径中的图片文件是否存在

xuyuhui666 commented 4 years ago

好的，谢谢了，换了解压文件方式后能加载模型了

littletomatodonkey commented 4 years ago

好的~

wsibo commented 4 years ago

好的，谢谢了，换了解压文件方式后能加载模型了

可以详细说说怎么解决的么？我这边解压出来预训练模型是很多个文件。也是没有.pdparams文件

Yaoxingtian commented 3 years ago

加载模型时候报以下错误： ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found,

这一看路径就错了嘛！要写成：./pretrain_models/MobileNetV3_large_x0_5_pretrained/best_accuracy

PaddlePaddle / PaddleOCR