Open endy-see opened 5 years ago
可以暂时参考这个:https://github.com/PaddlePaddle/models/issues/1360
另外,我们在持续跟进寻找更好的修复办法。
可以暂时参考这个:#1360
另外,我们在持续跟进寻找更好的修复办法。
但是我不是通过编译的方式安装的paddle,我是直接pip安装的,下面z这个解决办法不适合我吧
This happened in v1.4 and v1.5, looks like v1.3 doesn't have such issue
cat /proc/cpuinfo
processor : 27
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz
stepping : 4
microcode : 0x2000043
cpu MHz : 2000.000
cache size : 19712 KB
physical id : 1
siblings : 14
core id : 14
cpu cores : 14
apicid : 60
initial apicid : 60
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
bogomips : 4004.68
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
I have tried the 1.6 and delvelop version occur the same problems like you :
dynamic_loader.cc:140] Failed to find dynamic library: /paddle/build/third_party/install/warpctc/lib/libwarpctc.so
My local environment: CentOS: release 6.9 NCCL: v2.4.7 cuda: 9.0.176 cudnn: 7.3.1 Paddle: 1.5.1 Python: 3.7.3
When i start training ocr_recognition model with crnn_ctc model, paddle occured error as follow:
(paddle) [ocr_recognition]# env CUDA_VISIBLE_DEVICES=0 python train.py --train_images dataset/public_data_english/train_images --train_list dataset/public_data_english/train.list --test_images dataset/public_data_english/test_images --test_list dataset/public_data_english/test.list ----------- Configuration Arguments ----------- average_window: 0.15 batch_size: 32 eval_period: 15000 init_model: None log_period: 1000 max_average_window: 12500 min_average_window: 10000 model: crnn_ctc parallel: False profile: False save_model_dir: ./models save_model_period: 15000 skip_batch_num: 0 skip_test: False test_images: dataset/public_data_english/test_images test_list: dataset/public_data_english/test.list total_step: 720000 train_images: dataset/public_data_english/train_images train_list: dataset/public_data_english/train.list use_gpu: True /home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/evaluator.py:71: Warning: The EditDistance is deprecated, because maintain a modified program inside evaluator cause bug easily, please use fluid.metrics.EditDistance instead. % (self.class.name, self.class.name), Warning) finish batch shuffle W0801 21:22:58.187352 37850 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 9.0 W0801 21:22:58.192481 37850 device_context.cc:267] device: 0, cuDNN Version: 7.3. W0801 21:22:59.779482 37850 dynamic_loader.cc:140] Failed to find dynamic library: /paddle/build/third_party/install/warpctc/lib/libwarpctc.so (dlopen: cannot load any more object with static TLS) W0801 21:22:59.779705 37850 dynamic_loader.cc:109] Can not find library: libwarpctc.so. The process maybe hang. Please try to add the lib path to LD_LIBRARY_PATH. Traceback (most recent call last): File "train.py", line 222, in main() File "train.py", line 218, in main train(args) File "train.py", line 151, in train results = train_one_batch(data) File "train.py", line 112, in train_one_batch fetch_list=fetch_vars) File "/home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 651, in run use_program_cache=use_program_cache) File "/home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/executor.py", line 749, in run exe.run(program.desc, scope, 0, True, True, fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: Invoke operator warpctc error. Python Callstacks: File "/home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 1771, in append_op attrs=kwargs.get("attrs", None)) File "/home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(args, kwargs) File "/home/work/software/anaconda2/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 5573, in warpctc 'use_cudnn': use_cudnn File "/home/zhaoyanmei/models/PaddleCV/ocr_recognition/crnn_ctc_model.py", line 189, in ctc_train_net input=fc_out, label=label, blank=num_classes, norm_by_times=True) File "train.py", line 61, in train args, data_shape, num_classes) File "train.py", line 218, in main train(args) File "train.py", line 222, in main() C++ Callstacks: Failed to find dynamic library: libwarpctc.so ( dlopen: cannot load any more object with static TLS ) Please specify its path correctly using following ways: Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS. For instance, issue command: export LD_LIBRARY_PATH=... Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at [/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:166] PaddlePaddle Call Stacks: 0 0x7fe93ff05830p void paddle::platform::EnforceNotMet::Init(char const, char const, int) + 352
1 0x7fe93ff05ba9p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const, int) + 137
2 0x7fe941f09f9bp paddle::platform::dynload::GetWarpCTCDsoHandle() + 1835
3 0x7fe940177be9p void std::once_call_impl<std::Bind_simple<paddle::platform::dynload::DynLoad__get_warpctc_version::operator()<>()::{lambda()#1} ()> >() + 9
4 0x7fe9b196fbe0p pthread_once + 80
5 0x7fe9401809b8p paddle::operators::WarpCTCFunctorpaddle::platform::CUDADeviceContext::operator()(paddle::framework::ExecutionContext const&, float const, float, int const, int const, int const, unsigned long, unsigned long, unsigned long, float) + 136
6 0x7fe940183206p paddle::operators::WarpCTCKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 2390
7 0x7fe940184ab3p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::WarpCTCKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Anydata const&, paddle::framework::ExecutionContext const&) + 35
8 0x7fe941e6bf07p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext) const + 375
9 0x7fe941e6c2e1p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void> const&) const + 529
10 0x7fe941e698dcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void> const&) + 332
11 0x7fe94009061ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext, paddle::framework::Scope, bool, bool, bool) + 382
12 0x7fe9400936bfp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
13 0x7fe93fef6ebdp
14 0x7fe93ff38166p
15 0x7fe9b1f1b6e4p _PyMethodDef_RawFastCallKeywords + 612
16 0x7fe9b1f1b801p _PyCFunction_FastCallKeywords + 33
17 0x7fe9b1f777aep _PyEval_EvalFrameDefault + 21374
18 0x7fe9b1eb84f9p _PyEval_EvalCodeWithName + 761
19 0x7fe9b1f1aa27p _PyFunction_FastCallKeywords + 903
20 0x7fe9b1f738fep _PyEval_EvalFrameDefault + 5326
21 0x7fe9b1eb84f9p _PyEval_EvalCodeWithName + 761
22 0x7fe9b1f1aa27p _PyFunction_FastCallKeywords + 903
23 0x7fe9b1f738fep _PyEval_EvalFrameDefault + 5326
24 0x7fe9b1eb8db9p _PyEval_EvalCodeWithName + 3001
25 0x7fe9b1f1aa27p _PyFunction_FastCallKeywords + 903
26 0x7fe9b1f72846p _PyEval_EvalFrameDefault + 1046
27 0x7fe9b1eb8db9p _PyEval_EvalCodeWithName + 3001
28 0x7fe9b1f1aa27p _PyFunction_FastCallKeywords + 903
29 0x7fe9b1f72846p _PyEval_EvalFrameDefault + 1046
30 0x7fe9b1f1a79bp _PyFunction_FastCallKeywords + 251
31 0x7fe9b1f72846p _PyEval_EvalFrameDefault + 1046
32 0x7fe9b1eb84f9p _PyEval_EvalCodeWithName + 761
33 0x7fe9b1eb93c4p PyEval_EvalCodeEx + 68
34 0x7fe9b1eb93ecp PyEval_EvalCode + 28
35 0x7fe9b1fd1874p
36 0x7fe9b1fdbb81p PyRun_FileExFlags + 161
37 0x7fe9b1fdbd73p PyRun_SimpleFileExFlags + 451
38 0x7fe9b1fdce5fp
39 0x7fe9b1fdcf7cp _Py_UnixMain + 60
40 0x7fe9b15c3b45p __libc_start_main + 245
41 0x7fe9b1f82122p
(paddle) [ocr_recognition]#
Can anyone help me? Thank you in advance!