PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Apache License 2.0
6.92k stars 2.91k forks source link

Check failed: ptr Fail to allocate GPU memory 1024 bytes #884

Closed ecit241 closed 6 years ago

ecit241 commented 6 years ago

[INFO 2018-04-27 12:22:45,641 layers.py:2829] output for __pool_3__: c = 128, h = 11, w = 3, size = 4224 F0427 12:22:45.654973 29609 Allocator.h:89] Check failed: ptr Fail to allocate GPU memory 1024 bytes Check failure stack trace: @ 0x7fc6072639cd google::LogMessage::Fail() @ 0x7fc60726747c google::LogMessage::SendToLog() @ 0x7fc6072634f3 google::LogMessage::Flush() @ 0x7fc60726898e google::LogMessageFatal::~LogMessageFatal() @ 0x7fc60719912e paddle::GpuAllocator::alloc() @ 0x7fc60718075f paddle::PoolAllocator::alloc() @ 0x7fc60718016f paddle::GpuMemoryHandle::GpuMemoryHandle() @ 0x7fc60719320e paddle::GpuVectorT<>::GpuVectorT() @ 0x7fc6071933c8 paddle::VectorT<>::create() @ 0x7fc607193479 paddle::VectorT<>::createParallelVector() @ 0x7fc60701a526 paddle::Parameter::enableType() @ 0x7fc607015ccc paddle::parameterInitNN() @ 0x7fc60701853e paddle::NeuralNetwork::init() @ 0x7fc60704132f paddle::GradientMachine::create() @ 0x7fc607240575 GradientMachine::createFromPaddleModelPtr() @ 0x7fc60724075f GradientMachine::createByConfigProtoStr() @ 0x7fc606ecfbb7 _wrap_GradientMachine_createByConfigProtoStr @ 0x4c30ce PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4c1e6f PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4c16e7 PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4d55f3 (unknown) @ 0x4eebee (unknown) @ 0x4ee7f6 (unknown) @ 0x4aa9ab (unknown) @ 0x4c15bf PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4d55f3 (unknown) @ 0x4a577e PyObject_Call @ 0x4bed3d PyEval_EvalFrameEx

Aborted (core dumped) 我的是系统是ubuntu 16.04.两块4G的N卡,怎么会报GPU 的错误,我是用scene_text_recognition例子的代码跑的 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.48 Driver Version: 390.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2200 Off | 00000000:01:00.0 On | N/A | | 42% 40C P8 1W / 39W | 132MiB / 4040MiB | 4% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro K2200 Off | 00000000:02:00.0 Off | N/A | | 42% 34C P8 1W / 39W | 1MiB / 4043MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1057 G /usr/lib/xorg/Xorg 131MiB | +-----------------------------------------------------------------------------+

ecit241 commented 6 years ago

CUDA安装的是9.1,可以跑samples下的例子 image

kuke commented 6 years ago
  1. 有设置环境变量CUDA_VISIBLE_DEVICES
  2. 有可能是cuda 9.0版本过高,请尝试一下cuda<=8.0
ecit241 commented 6 years ago

有在前面加env CUDA_VISIABLE_DEVICES=0,1,降了版本,还是有问题,是不是要编译支持GPU和CUDA的版本

kuke commented 6 years ago

@ecit241 对的,这是关键,注意这个例子中的readme,一定要安装GPU版本的paddlepaddle才行

ecit241 commented 6 years ago

只能源码编译paddlepaddle,编译gpu版本,需要依赖安装cuda安装8.0 cudnn安装7 intel的mkl-dnn库 还有百度的warp-ctc 我有两块显卡,执行 nohup env CUDA_VISIABLE_DEVICES=0,1 python train.py --train_file_list_path 'data/train_data/gt.txt' --test_file_list_path 'data/test_data/Challenge2_Test_Task3_GT.txt' --label_dict_path 'label_dict.txt'--parallel=True &

执行如下: Pass 9, batch 450, Samples 450, Cost 2.594310, Eval {'ctc_error_evaluator_0__.insertion_error': 0.0, 'ctc_error_evaluator_0.error': 0.8999999761581421, 'ctc_error_evaluator_0.substitution_error': 0.10000000149011612, '__ctc_error_evaluator_0.sequence_error': 1.0, 'ctc_error_evaluator_0__.deletion_error': 0.800000011920929} Pass 9, batch 500, Samples 500, Cost 0.658113, Eval {'ctc_error_evaluator_0.insertion_error': 0.0, '__ctc_error_evaluator_0.error': 1.0, 'ctc_error_evaluator_0__.substitution_error': 0.125, 'ctc_error_evaluator_0.sequence_error': 1.0, '__ctc_error_evaluator_0.deletion_error': 0.875} Pass 9, batch 550, Samples 550, Cost 1.506957, Eval {'ctc_error_evaluator_0__.insertion_error': 0.0, 'ctc_error_evaluator_0.error': 0.0, 'ctc_error_evaluator_0.substitution_error': 0.0, '__ctc_error_evaluator_0.sequence_error': 0.0, 'ctc_error_evaluator_0__.deletion_error': 0.0} Pass 9, batch 600, Samples 600, Cost 1.302853, Eval {'ctc_error_evaluator_0.insertion_error': 0.0, '__ctc_error_evaluator_0.error': 1.0, 'ctc_error_evaluator_0__.substitution_error': 0.4000000059604645, 'ctc_error_evaluator_0.sequence_error': 1.0, '__ctc_error_evaluator_0.deletion_error': 0.6000000238418579} Pass 9, batch 650, Samples 650, Cost 2.896224, Eval {'ctc_error_evaluator_0__.insertion_error': 0.0, 'ctc_error_evaluator_0.error': 1.0, 'ctc_error_evaluator_0.substitution_error': 0.5, '__ctc_error_evaluator_0.sequence_error': 1.0, 'ctc_error_evaluator_0__.deletion_error': 0.5} Pass 9, batch 700, Samples 700, Cost 1.515006, Eval {'ctc_error_evaluator_0.insertion_error': 0.0, '__ctc_error_evaluator_0.error': 0.0, 'ctc_error_evaluator_0__.substitution_error': 0.0, 'ctc_error_evaluator_0.sequence_error': 0.0, '__ctc_error_evaluator_0.deletion_error': 0.0} Pass 9, batch 750, Samples 750, Cost 0.803040, Eval {'ctc_error_evaluator_0__.insertion_error': 0.0, 'ctc_error_evaluator_0.error': 1.0, 'ctc_error_evaluator_0.substitution_error': 0.0, '__ctc_error_evaluator_0.sequence_error': 1.0, 'ctc_error_evaluator_0__.deletion_error': 1.0} Pass 9, batch 800, Samples 800, Cost 3.827827, Eval {'ctc_error_evaluator_0.insertion_error': 0.0, '__ctc_error_evaluator_0.error': 1.0, 'ctc_error_evaluator_0__.substitution_error': 0.20000000298023224, 'ctc_error_evaluator_0.sequence_error': 1.0, '__ctc_error_evaluator_0.deletion_error': 0.800000011920929} Test 9, Cost 10.017384, Eval {'ctc_error_evaluator_0__.insertion_error': 0.003957382403314114, 'ctc_error_evaluator_0.error': 0.9725003838539124, 'ctc_error_evaluator_0.substitution_error': 0.18465375900268555, '__ctc_error_evaluator_0.sequence_error': 1.0, '__ctc_error_evaluator_0__.deletion_error': 0.7838889956474304}

kuke commented 6 years ago

@ecit241 看起来已经正常执行了?

shanyi15 commented 6 years ago

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!