PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.26k stars 5.59k forks source link

rec_r34_vd_tps_bilstm_attn(from PaddleOCR repo) MKLDNN prediction failed #27398

Closed OliverLPH closed 4 years ago

OliverLPH commented 4 years ago

System information -PaddlePaddle version: develop, d28162b97fd2d224968c18c9da735900ef280e7c -CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, mkldnn enabled -GPU: P4, CUDA10, CUDNN7.5, compile with GPU, but run cpu inference -OS Platform: Ubuntu1604 -Dockerfile: Paddle/tools/manylinux1/Dockerfile.cuda10_cudnn7_gcc48_ubuntu16 -Python version: 3.7 -Cmake orders -C++version.txt -API information

config = AnalysisConfig(args.model_file, args.params_file)
config.disable_gpu()
config.switch_use_feed_fetch_ops(False)
config.switch_specify_input_names(True)
config.enable_mkldnn()

Note: You can get most of the information by running summary_env.py.

Steps to reproduce the behavior mkldnn_reproduce_code.txt

python3.7 mkldnn_reproduce_code.txt --model_file=./rec_r34_vd_tps_bilstm_attn/__model__ \
                                    --params_file=./rec_r34_vd_tps_bilstm_attn/__params__

Describe your current behavior

I0918 07:50:21.910486 27984 analysis_predictor.cc:527] ======= optimize end =======
I0918 07:50:21.910612 27984 naive_executor.cc:102] ---  skip [feed], feed -> image
I0918 07:50:21.911798 27984 naive_executor.cc:102] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0918 07:50:21.911818 27984 naive_executor.cc:102] ---  skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
W0918 07:50:21.927631 27984 operator.cc:205] pool2d raises an exception dnnl::error, could not create a descriptor for a pooling forward propagation primitive
Traceback (most recent call last):
  File "test_zerotensor.py", line 44, in <module>
    main()
  File "test_zerotensor.py", line 20, in main
    predictor.zero_copy_run()
RuntimeError: could not create a descriptor for a pooling forward propagation primitive

Code to reproduce the issue mkldnn_reproduce_code.txt

Other info / logs image

luotao1 commented 4 years ago

@lidanqing-intel @jczaja Please help see this OCR issue

lidanqing-intel commented 4 years ago

@luotao1 Where is the model file located ?

lidanqing-intel commented 4 years ago

model link : https://sys-p0.bj.bcebos.com/inference/python-ocr-infer.tgz

jczaja commented 4 years ago

@luotao1 , @lidanqing-intel I reproducd the problem. Root cause is that oneDNN pool2d kernel is not supporting "adpative" pooling so this is something that need to be added to fix this issue

jczaja commented 4 years ago

@luotao1 This issue to be fixed require two PRs to be merged: 1) #27770 (merged) 2) #27747 (awaiting approval)