PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.07k stars 5.54k forks source link

MKLDNN Error: few segmentation model prediction failed #30115

Closed OliverLPH closed 3 years ago

OliverLPH commented 3 years ago

Thank you for contributing to PaddlePaddle. Before submitting the issue, you could search issue in the github in case that th If there is no solution,please make sure that this is an inference issue including the following details : System information -PaddlePaddle version (eg.1.1)or CommitID -CPU: MKLDNN, onednn 1.8, Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz -GPU: None -OS Platform: CentOS Linux 7.6.1810 -Docker Image: hub.baidubce.com/paddlepaddle/paddle:latest-dev -Python version: None -Cmake orders -C++version.txt -API information Note: You can get most of the information by running summary_env.py. To Reproduce Seg model download link: https://sys-p0.bj.bcebos.com/Paddle-UnitTest-Model/PaddleSeg.tgz current failed models are fastscnn, hrnet, pspnet, you can download from link above

git clone https://github.com/PaddlePaddle/continuous_integration.git
cd continuous_integration/inference/inference_api_test/cpp_api_test
bash build.sh ${paddle_Inference_dir} OFF ON OFF
./build/test_clas_model --model_name=${model_name} \
                            --model_path=${model_path} \
                            --params_path=${params_path} \
                            --image_shape="3,512,512" \
                            --use_mkldnn=true \
                            --batch_size=1 \
                            --use_gpu=false \
                            --accuracy="1e-6" \
                            --gtest_output=xml:test_${model_name}_mkldnn_${accuracy}_bz${batch_size}.xml

Describe your current behavior image

    File "/workspace/pdseg/models/model_builder.py", line 167, in build_model
      logits = seg_model(image, class_num)
    File "/workspace/pdseg/models/model_builder.py", line 86, in seg_model
      logits = fast_scnn.fast_scnn(image, class_num)
    File "/workspace/pdseg/models/modeling/fast_scnn.py", line 280, in fast_scnn
      lower_res_feature = global_feature_extractor.net(higher_res_features)
    File "/workspace/pdseg/models/modeling/fast_scnn.py", line 226, in net
      x = psp_module(x, self.block_channels[2] // 4)
    File "/workspace/pdseg/models/modeling/fast_scnn.py", line 148, in psp_module
      name=psp_name + '_adapool')
    File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 2461, in adaptive_pool2d
      "adaptive": True,
    File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2917, in append_op
      attrs=kwargs.get("attrs", None))
    File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2014, in __init__
      for frame in traceback.extract_stack():

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::AnalysisPredictor::ZeroCopyRun()
1   paddle::framework::NaiveExecutor::Run()
2   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
3   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
5   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::PoolMKLDNNOpKernel<float>, paddle::operators::PoolMKLDNNOpKernel<signed char>, paddle::operators::PoolMKLDNNOpKernel<unsigned char>, paddle::operators::PoolMKLDNNOpKernel<paddle::platform::bfloat16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
6   paddle::operators::PoolMKLDNNOpKernel<float>::Compute(paddle::framework::ExecutionContext const&) const
7   paddle::platform::PoolingMKLDNNHandler<float>::PoolingMKLDNNHandler(paddle::framework::ExecutionContext const&, paddle::platform::MKLDNNDeviceContext const&, dnnl::engine, paddle::platform::Place, paddle::framework::Tensor const*, paddle::framework::Tensor*, std::string const&)
8   paddle::platform::PoolingMKLDNNHandler<float>::ComputeAdaptivePoolParameters(paddle::framework::ExecutionContext const&, std::vector<long, std::allocator<long> > const&, std::vector<long, std::allocator<long> >&, std::vector<long, std::allocator<long> >&)
9   paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const*, int)
10  paddle::platform::GetCurrentTraceBackString()

----------------------
Error Message Summary:
----------------------
UnimplementedError: Input dim must be divisible by corressponding ksize dim.
  [Hint: Expected src_tz[src_tz.size() - 1] % ksize[1] == 0, but received src_tz[src_tz.size() - 1] % ksize[1]:1 != 0:0.] (at /paddle/paddle/fluid/platform/mkldnn_reuse.h:906)
  [operator < pool2d > error]" thrown in the test body.

Code to reproduce the issue Other info / logs

paddle-bot-old[bot] commented 3 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

jczaja commented 3 years ago

@OliverLPH This PR https://github.com/PaddlePaddle/Paddle/pull/30757 should fix crashes reported in this issue. Please test

OliverLPH commented 3 years ago

@jczaja Thanks~ I will verify this PR

OliverLPH commented 3 years ago

@jczaja @luotao1 Hi, I have verified this PR, model prediction looks great without crash. Please cherry-pick to release/2.0 branch, thanks so much~

paddle-bot-old[bot] commented 3 years ago

Are you satisfied with the resolution of your issue?

YES No