Open lifeIsBeautifulgo123 opened 4 months ago
你好,这个错误栈表明,在PaddlePaddle中使用ParallelExecutor时,初始化NCCL(NVIDIA’s library for distributed computing)上下文时出现了问题。 可以检查你的环境是否支持NCCL。如果你在没有GPU或者没有安装NCCL的环境中运行使用了NCCL的PaddlePaddle代码,就可能会出现这个错误。确认你的机器上已经安装了NCCL,并且你的PaddlePaddle版本是支持GPU的版本。
File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/compiler.py", line 424, in _compile
places=self._places)
File "/opt/_internal/cpython-2.7.11-ucs4/lib/python2.7/site-packages/paddle/fluid/compiler.py", line 377, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) > const&, ncclUniqueId,
unsigned long, unsigned long) > const&, std::vector<ncc
lUniqueId , std::allocator<ncclUniqueId> > const&, unsigned long, unsigned long) > const&, std::vecto
r<std::string, std::allocator > const&, std::string const&, paddle::framework::Scope, std::vector<paddle::framework::Scope, std::allocator<p
addle::framework::Scope> > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framewo
rk::ir::Graph)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int)
2 paddle::platform::NCCLGroupGuard::~NCCLGroupGuard()
3 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocator
4 paddle::platform::NCCLCommunicator::InitFlatCtxs(std::vector<paddle::platform::Place, std::allocator
5 paddle::framework::ParallelExecutorPrivate::InitNCCLCtxs(paddle::framework::Scope, paddle::framework::details::BuildStrategy const&)
6 paddle::framework::ParallelExecutorPrivate::InitOrGetNCCLCommunicator(paddle::framework::Scope, paddle::framework::details::BuildStrategy)
7 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocator
Error Message Summary:
Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of e rror check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.
[unhandled system error] at (/paddle/paddle/fluid/platform/nccl_helper.h:70)