PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.66k stars 2.87k forks source link

Bus error (core dumped) #5878

Open jiaerwang0328 opened 2 years ago

jiaerwang0328 commented 2 years ago

训练ppyoloe时,遇到错误 cuda10.2 paddle2.2

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool, std::map<std::string, std::string, std::less<std::string >, std::allocator<std::pair<std::string const, std::string > > > const&)
1   paddle::imperative::PreparedOp::Prepare(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::OperatorWithKernel const&, paddle::platform::Place const&, paddle::framework::AttributeMap const&, paddle::framework::AttributeMap const&)
2   paddle::imperative::PreparedOp paddle::imperative::PrepareImpl<paddle::imperative::VarBase>(paddle::imperative::details::NameVarMapTrait<paddle::imperative::VarBase>::Type const&, paddle::imperative::details::NameVarMapTrait<paddle::imperative::VarBase>::Type const&, paddle::framework::OperatorWithKernel const&, paddle::platform::Place const&, paddle::framework::AttributeMap const&, paddle::framework::AttributeMap const&)
3   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
4   std::__future_base::_Deferred_state<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_complete_async()
5   std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)
6   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
7   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
8   paddle::platform::CUDAContext::CUDAContext(paddle::platform::CUDAPlace const&, paddle::platform::stream::Priority const&, paddle::platform::stream::StreamFlag const&)
9   paddle::platform::CUDAContext::InitCuSolverContext()
10  cusolverDnCreate

----------------------
Error Message Summary:
----------------------
FatalError: `Access to an undefined portion of a memory object` is detected by the operating system.
  [TimeInfo: *** Aborted at 1651560573 (unix time) try "date -d @1651560573" if you are using GNU date ***]
  [SignalInfo: *** SIGBUS (@0x7ff8d19f2000) received by PID 19918 (TID 0x7ffa49643700) from PID 18446744072931450880 ***]

Bus error (core dumped)
qingqing01 commented 2 years ago

从错误信息不太能看出什么问题。请问测试一个简单的Paddle程序,是否可以正常跑GPU?

jiaerwang0328 commented 2 years ago

这个简单程序也会报同样错误

---Original--- From: @.> Date: Tue, May 3, 2022 16:33 PM To: @.>; Cc: @.**@.>; Subject: Re: [PaddlePaddle/PaddleDetection] [Other General Issues] Bus error (core dumped) (Issue #5878)

从错误信息不太能看出什么问题。请问测试一个简单的Paddle程序,是否可以正常跑GPU?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

qingqing01 commented 2 years ago

如果是简单程序也会报错的话,大概率是安装Paddle的问题。需要注意驱动、CUDA、cuDNN版本等正确。

jiaerwang0328 commented 2 years ago

cuda 10.1,cudnn 7.5.0 驱动418,安装的是paddlepaddle_gpu-2.2.0.post101-cp37-cp37m-linux_x86_64.whl,没问题吧,我第一次训练时成功了,第二次训练时就报错了

qingqing01 commented 2 years ago

@jiaerwang0328 看你第一条回复CUDA 10.2。 请确认下CUDA版本吧。

jiaerwang0328 commented 2 years ago

是cuda 10.1