PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.11k stars 5.55k forks source link

help paddle-fluid-v1.7.1 textcnn running error #64535

Open lifeIsBeautifulgo123 opened 4 months ago

lifeIsBeautifulgo123 commented 4 months ago
program._compile(scope, self.place)                                                                                                                    

File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/compiler.py", line 424, in _compile
places=self._places)
File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/compiler.py", line 377, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
EnforceNotMet:


C++ Call Stacks (More useful to developers):


0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const
, int)
2 paddle::platform::NCCLGroupGuard::~NCCLGroupGuard()
3 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocator > const&, ncclUniqueId, unsigned long, unsigned long)
4 paddle::platform::NCCLCommunicator::InitFlatCtxs(std::vector<paddle::platform::Place, std::allocator > const&, std::vector<ncc lUniqueId
, std::allocator<ncclUniqueId> > const&, unsigned long, unsigned long)
5 paddle::framework::ParallelExecutorPrivate::InitNCCLCtxs(paddle::framework::Scope
, paddle::framework::details::BuildStrategy const&)
6 paddle::framework::ParallelExecutorPrivate::InitOrGetNCCLCommunicator(paddle::framework::Scope, paddle::framework::details::BuildStrategy)
7 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocator > const&, std::vecto r<std::string, std::allocator > const&, std::string const&, paddle::framework::Scope, std::vector<paddle::framework::Scope, std::allocator<p addle::framework::Scope> > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framewo rk::ir::Graph)


Error Message Summary:


Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of e rror check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.

zhangbo9674 commented 4 months ago

你好,这个错误栈表明,在PaddlePaddle中使用ParallelExecutor时,初始化NCCL(NVIDIA’s library for distributed computing)上下文时出现了问题。 可以检查你的环境是否支持NCCL。如果你在没有GPU或者没有安装NCCL的环境中运行使用了NCCL的PaddlePaddle代码,就可能会出现这个错误。确认你的机器上已经安装了NCCL,并且你的PaddlePaddle版本是支持GPU的版本。