PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.17k stars 5.56k forks source link

ngraph compile fail #15599

Closed tensor-tang closed 4 years ago

tensor-tang commented 5 years ago

how to reproduce

cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_TESTING=ON -DWITH_FLUID_ONLY=ON -DWITH_DOC=OFF -DWITH_CONTRIB=OFF   -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_MKLDNN=ON -DWITH_NGRAPH=ON
make -j

error

[ 99%] Linking CXX static library libpaddle_fluid.a
../../operators/ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `paddle::operators::NgraphEngine::BuildNgFunction()':
ngraph_engine.cc:(.text+0x43d4): undefined reference to `ngraph::Function::Function(ngraph::NodeVector const&, ngraph::ParameterVector const&, std::string const&)'
../../operators/ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `_GLOBAL__sub_I_ngraph_engine.cc':
ngraph_engine.cc:(.text.startup+0x268): undefined reference to `ngraph::runtime::Backend::create(std::string const&)'
../../operators/ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<float>(ngraph::element::Type const&, ngraph::Shape, std::vector<float, std::allocator<float> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x99): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x674): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
../../operators/ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<int>(ngraph::element::Type const&, ngraph::Shape, std::vector<int, std::allocator<int> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x99): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x674): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
collect2: error: ld returned 1 exit status
paddle/fluid/inference/analysis/CMakeFiles/test_analyzer.dir/build.make:513: recipe for target 'paddle/fluid/inference/analysis/test_analyzer' failed
make[2]: *** [paddle/fluid/inference/analysis/test_analyzer] Error 1

[ 98%] Linking CXX executable test_analyzer
../../operators/ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `paddle::operators::NgraphEngine::BuildNgFunction()':
ngraph_engine.cc:(.text+0x43d4): undefined reference to `ngraph::Function::Function(ngraph::NodeVector const&, ngraph::ParameterVector const&, std::string const&)'
../../operators/ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `_GLOBAL__sub_I_ngraph_engine.cc':
ngraph_engine.cc:(.text.startup+0x268): undefined reference to `ngraph::runtime::Backend::create(std::string const&)'
../../operators/ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<float>(ngraph::element::Type const&, ngraph::Shape, std::vector<float, std::allocator<float> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x99): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x674): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
../../operators/ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<int>(ngraph::element::Type const&, ngraph::Shape, std::vector<int, std::allocator<int> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x99): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x674): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
collect2: error: ld returned 1 exit status
paddle/fluid/inference/analysis/CMakeFiles/test_analyzer.dir/build.make:513: recipe for target 'paddle/fluid/inference/analysis/test_analyzer' failed
make[2]: *** [paddle/fluid/inference/analysis/test_analyzer] Error 1
CMakeFiles/Makefile2:58274: recipe for target 'paddle/fluid/inference/analysis/CMakeFiles/test_analyzer.dir/all' failed
make[1]: *** [paddle/fluid/inference/analysis/CMakeFiles/test_analyzer.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

@baojun-nervana @mozga-intel could please help to fix this.

luotao1 commented 5 years ago

How about use the images paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5 or paddlepaddle/paddle:latest-dev? I can compiler normally.

tensor-tang commented 5 years ago

paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5

tried this, but still failed.

mozga-intel commented 5 years ago

@luotao1 @tensor-tang Thank you, Let me check it.

luotao1 commented 5 years ago

For paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5, when I run ctest -R test_*_ngraph_op -V. The unit-test hangs.

You can follow the cmake option in http://ci.paddlepaddle.org/viewLog.html?buildId=54897&tab=buildLog&buildTypeId=Manylinux1_CpuAvxCp27cp27mu&logTab=tree&filter=all&_focus=201#_state=71 And -DWITH_NGRAPH=ON

mozga-intel commented 5 years ago

@luotao1, @tensor-tang Could you please send me more details about this problem, i.e the computer system, whether it's an image of the docker, whether it's a parallel build (...)? Generally, I tried to follow in @tensor-tang's footsteps but the error message is not the same, plus I got a bit different result than you. On the whole, I did not get those mistakes during the build process on my native machine.

luotao1 commented 5 years ago

but the error message is not the same

@mozga-intel Could you paste your error log?

tensor-tang commented 5 years ago

I tried this one paddlepaddle/paddle:latest-dev, it passed.

baojun-nervana commented 5 years ago

error from @mozga-intel "/usr/bin/ld: cannot open linker script file /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libm.so: Too many open files collect2: error: ld returned 1 exit status [ 99%] Generating paddle_fluid.dir/reset_tensor_array.objlist paddle/fluid/inference/CMakeFiles/paddle_fluid_shared.dir/build.make:1090: recipe for target 'paddle/fluid/inference/libpaddle_fluid.so' failed make[2]: [paddle/fluid/inference/libpaddle_fluid.so] Error 1 CMakeFiles/Makefile2:54733: recipe for target 'paddle/fluid/inference/CMakeFiles/paddle_fluid_shared.dir/all' failed make[1]: [paddle/fluid/inference/CMakeFiles/paddle_fluid_shared.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....

luotao1 commented 5 years ago

Too many open files

Maybe you can pull the latest paddlepaddle/paddle:latest-dev image.

For paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5, please refer to http://ci.paddlepaddle.org/viewLog.html?buildId=55444&tab=buildLog&buildTypeId=Manylinux1_CpuAvxCp27cp27mu&logTab=tree&filter=all&_focus=144#_state=68

[23:13:34][Step 1/8]         -DPYTHON_EXECUTABLE:FILEPATH=/opt/python/cp27-cp27mu/bin/python
[23:13:34][Step 1/8]             -DPYTHON_INCLUDE_DIR:PATH=/opt/python/cp27-cp27mu/include/python2.7
[23:13:34][Step 1/8]             -DPYTHON_LIBRARIES:FILEPATH=/opt/_internal/cpython-2.7.11-ucs4/lib/libpython2.7.so
tensor-tang commented 5 years ago

Some info that failed,

- Found Paddle host system: ubuntu, version: 16.04.3
-- Found Paddle host system's CPU: 38 cores
-- CXX compiler: /usr/bin/g++-4.8, version: GNU 4.8.5
-- C compiler: /usr/bin/gcc-4.8, version: GNU 4.8.5

This is in build my docker image tensortang/ubuntu:16.04-paddle-6148

baojun-nervana commented 5 years ago

@luotao1 I cannot reproduce the ngraph test hanging issue paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5 either. Do you see this issue consistently?

tensor-tang commented 5 years ago

@baojun-nervana I tried again, it seems fail at gcc-4.8.5, and pass at gcc54 .

../ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `paddle::operators::NgraphEngine::BuildNgFunction()':
ngraph_engine.cc:(.text+0x42d4): undefined reference to `ngraph::Function::Function(ngraph::NodeVector const&, ngraph::ParameterVector const&, std::string const&)'
../ngraph/libngraph_engine.a(ngraph_engine.cc.o): In function `_GLOBAL__sub_I_ngraph_engine.cc':
ngraph_engine.cc:(.text.startup+0x268): undefined reference to `ngraph::runtime::Backend::create(std::string const&)'
../ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<double>(ngraph::element::Type const&, ngraph::Shape, std::vector<double, std::allocator<double> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IdEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IdEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x98): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IdEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IdEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x621): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
../ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<float>(ngraph::element::Type const&, ngraph::Shape, std::vector<float, std::allocator<float> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x98): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IfEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x621): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
../ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<unsigned long>(ngraph::element::Type const&, ngraph::Shape, std::vector<unsigned long, std::allocator<unsigned long> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2ImEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5ImEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x98): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2ImEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5ImEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x621): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
../ngraph/libngraph_bridge.a(ngraph_bridge.cc.o): In function `ngraph::op::Constant::Constant<int>(ngraph::element::Type const&, ngraph::Shape, std::vector<int, std::allocator<int> > const&)':
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x98): undefined reference to `ngraph::Node::Node(std::string const&, ngraph::NodeVector const&, unsigned long)'
ngraph_bridge.cc:(.text._ZN6ngraph2op8ConstantC2IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE[_ZN6ngraph2op8ConstantC5IiEERKNS_7element4TypeENS_5ShapeERKSt6vectorIT_SaIS9_EE]+0x621): undefined reference to `ngraph::node_validation_assertion_string(ngraph::Node const*)'
collect2: error: ld returned 1 exit status
paddle/fluid/operators/benchmark/CMakeFiles/op_tester.dir/build.make:537: recipe for target 'paddle/fluid/operators/benchmark/op_tester' failed
make[2]: *** [paddle/fluid/operators/benchmark/op_tester] Error 1
CMakeFiles/Makefile2:54832: recipe for target 'paddle/fluid/operators/benchmark/CMakeFiles/op_tester.dir/all' failed
make[1]: *** [paddle/fluid/operators/benchmark/CMakeFiles/op_tester.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2
baojun-nervana commented 5 years ago

@tensor-tang Thanks for the info. Will follow up on compile on gcc 4.8.5

tensor-tang commented 5 years ago

Any update about this issue?

baojun-nervana commented 5 years ago

I will follow up on this.

baojun-nervana commented 5 years ago

@tensor-tang I tried paddlepaddle/paddle_manylinux_devel:cuda8.0_cudnn7, paddlepaddle/paddle_manylinux_devel:cuda7.5_cudnn5 (those two are centos, gcc 4.8.2) and paddlepaddle/paddle:latest-dev (ubuntu gcc5.4.0). I can compile on those two dockers. which docker did you use with gcc 4.8.5?

tensor-tang commented 5 years ago

Please try this https://github.com/PaddlePaddle/Paddle/issues/15599#issuecomment-459244914

env CC=/usr/bin/gcc-4.8 CXX=/usr/bin/g++-4.8 cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_TESTING=ON -DWITH_CONTRIB=OFF -DCMAKE_INSTALL_PREFIX=./tmpInstall -DWITH_LIBXSMM=OFF -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_MKLDNN=OFF
baojun-nervana commented 5 years ago

thanks. this helps. It am able to reproduce with docker latest-dev. Will figure it out.

baojun-nervana commented 5 years ago

I was able to reproduce the ngraph test hanging issue and it was due to glibc issue on centos6. PR#16990 should resolve the issue

paddle-bot-old[bot] commented 4 years ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。