文本分类或者分词不能同时调用

1205469665 commented 4 years ago

PaddleHub 1.5.4 PaddlePaddle 1.7.1post107
linux python 3.69

使用python flask 框架编写服务文本分类服务 , 分词在一定的时间内多个线程多台机器只调用文本分类或者分词不会报错

但是如果多个线程有不同的线程调用文本分类或者分类就是两个接口都会调用就会报错服务还会直接挂掉

此错误会导致服务直接挂掉 W0410 17:05:46.529703 22852 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly W0410 17:05:46.529724 22852 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0410 17:05:46.529728 22852 init.cc:214] The detail failure signal is:

W0410 17:05:46.529732 22852 init.cc:217] W0410 17:05:46.532021 22852 init.cc:217] PC: @ W0410 17:05:46.532105 22852 init.cc:217] W0410 17:05:46.534162 22852 init.cc:217] W0410 17:05:46.536512 22852 init.cc:217] W0410 17:05:46.538888 22852 init.cc:217] W0410 17:05:46.542672 22852 init.cc:217] W0410 17:05:46.545567 22852 init.cc:217] W0410 17:05:46.549872 22852 init.cc:217] W0410 17:05:46.552654 22852 init.cc:217] W0410 17:05:46.553498 22852 init.cc:217] W0410 17:05:46.555279 22852 init.cc:217] W0410 17:05:46.555611 22852 init.cc:217] W0410 17:05:46.555817 22852 init.cc:217] W0410 17:05:46.556154 22852 init.cc:217] W0410 17:05:46.556355 22852 init.cc:217] W0410 17:05:46.556551 22852 init.cc:217] W0410 17:05:46.556752 22852 init.cc:217] W0410 17:05:46.557062 22852 init.cc:217] W0410 17:05:46.557279 22852 init.cc:217] W0410 17:05:46.557477 22852 init.cc:217] W0410 17:05:46.557678 22852 init.cc:217] W0410 17:05:46.557986 22852 init.cc:217] W0410 17:05:46.558207 22852 init.cc:217] W0410 17:05:46.558408 22852 init.cc:217] W0410 17:05:46.558609 22852 init.cc:217] W0410 17:05:46.558917 22852 init.cc:217] W0410 17:05:46.559137 22852 init.cc:217] W0410 17:05:46.559337 22852 init.cc:217] W0410 17:05:46.559538 22852 init.cc:217] W0410 17:05:46.559847 22852 init.cc:217] W0410 17:05:46.560042 22852 init.cc:217] W0410 17:05:46.560253 22852 init.cc:217] W0410 17:05:46.560456 22852 init.cc:217] W0410 17:05:46.560765 22852 init.cc:217] ^Cbash: 行 1: 22469 段错误 Aborted at 1586509546 (unix time) try "date -d @1586509546" if you are using GNU date 0x0 (unknown) SIGSEGV (@0x9) received by PID 22469 (TID 0x7f29df7fe700) from PID 9; stack trace: @ 0x7f2c44f2a5f0 (unknown) @ 0x7f2bf1bc2fef std::_Sp_counted_base<>::_M_release() @ 0x7f2bf1bc48bc std::vector<>::operator=() @ 0x7f2bf412036d paddle::operators::FetchOp::RunImpl() @ 0x7f2bf4827ed0 paddle::framework::OperatorBase::Run() @ 0x7f2bf1f266c6 paddle::framework::Executor::RunPreparedContext() @ 0x7f2bf1f29d8b paddle::framework::Executor::Run() @ 0x7f2bf1babf8e _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE108_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4FUNES10 @ 0x7f2bf1c00461 pybind11::cpp_function::dispatcher() @ 0x55d258db1c54 _PyCFunction_FastCallDict @ 0x55d258e39c0e call_function @ 0x55d258e5c75a _PyEval_EvalFrameDefault @ 0x55d258e32e66 _PyEval_EvalCodeWithName @ 0x55d258e33ed6 fast_function @ 0x55d258e39b95 call_function @ 0x55d258e5d51c _PyEval_EvalFrameDefault @ 0x55d258e32e66 _PyEval_EvalCodeWithName @ 0x55d258e33ed6 fast_function @ 0x55d258e39b95 call_function @ 0x55d258e5d51c _PyEval_EvalFrameDefault @ 0x55d258e32e66 _PyEval_EvalCodeWithName @ 0x55d258e33ed6 fast_function @ 0x55d258e39b95 call_function @ 0x55d258e5d51c _PyEval_EvalFrameDefault @ 0x55d258e32e66 _PyEval_EvalCodeWithName @ 0x55d258e33ed6 fast_function @ 0x55d258e39b95 call_function @ 0x55d258e5d51c _PyEval_EvalFrameDefault @ 0x55d258e32e66 _PyEval_EvalCodeWithName @ 0x55d258e33e73 fast_function @ 0x55d258e39b95 call_function @ 0x55d258e5c75a _PyEval_EvalFrameDefault (吐核)env "JETBRAINS_REMOTE_RUN"="1" "LIBRARY_ROOTS"="C:/Users/Administrator/.PyCharm2018.3/system/remote_sources/1956375958/732364776;C:/Users/Administrator/.PyCharm2018.3/system/remote_sources/1956375958/-1465301300;C:/Users/Administrator/.PyCharm2018.3/system/python_stubs/1956375958;F:/开发软件/PyCharm 2018.3.5/helpers/python-skeletons;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/stdlib/3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/stdlib/2and3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/third_party/3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/third_party/2and3" "PYDEVD_LOAD_VALUES_ASYNC"="True" "PYTHONPATH"="/home/ssh-project/pwlp-judged:/root/.pycharm_helpers/pycharm_matplotlib_backend:/root/.pycharm_helpers/third_party/thriftpy:/root/.pycharm_helpers/pydev:C:/Users/Administrator/.PyCharm2018.3/system/cythonExtensions:/home/ssh-project/pwlp-judged" "PYTHONIOENCODING"="UTF-8" "PYTHONDONTWRITEBYTECODE"="1" "IPYTHONENABLE"="True" "PYCHARM_MATPLOTLIB_PORT"="51810" "PYCHARM_HOSTED"="1" "PYTHONUNBUFFERED"="1" "IDE_PROJECT_ROOTS"="/home/ssh-project/pwlp-judged" '/home/anaconda3/envs/cail/bin/python3.6' '-u' '/root/.pycharm_helpers/pydev/pydevd.py' '--multiproc' '--qt-support=auto' '--client' '0.0.0.0' '--port' '42281' '--file' '/home/ssh-project/pwlp-judged/setup.py'

此错误导致程序不能正常调用

File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functionsrule.endpoint File "/home/ssh-project/pwlp-judged/controller/paddle_nlp_controller.py", line 71, in initCustomIdfCorpus utils.init_custom_idf_corpus(document_list,allowPOS=('ns', 'n', 'vn', 'v', 'a')) File "/home/ssh-project/pwlp-judged/paddlenlp/utils/utils.py", line 67, in init_custom_idf_corpus default_tfidf.init_custom_idf_corpus(sentence_list,allowPOS) File "/home/ssh-project/pwlp-judged/paddlenlp/extractkeyword/tfidf.py", line 162, in init_custom_idf_corpus words = self.seg_sentence(allowPOS, sentence) File "/home/ssh-project/pwlp-judged/paddlenlp/extractkeyword/tfidf.py", line 122, in seg_sentence wt = self.tokenizer.lexical_analysis(sentence) File "/home/ssh-project/pwlp-judged/paddlenlp/utils/utils.py", line 32, in lexical_analysis return lac.lexical_analysis(data={'text': document}, user_dict=None) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddlehub/module/module.py", line 543, in call return_numpy=False) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 783, in run six.reraise(*sys.exc_info()) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/six.py", line 696, in reraise raise value File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 778, in run use_program_cache=use_program_cache) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 831, in _run_impl use_program_cache=use_program_cache) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 905, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 2 paddle::operators::LookupTableKernel::Compute(paddle::framework::ExecutionContext const&) const 3 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LookupTableKernel, paddle::operators::LookupTableKernel, paddle::operators::LookupTableKernel >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 6 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 7 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext, paddle::framework::Scope, bool, bool, bool) 8 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope, int, bool, bool, std::vector<std::string, std::allocator > const&, bool, bool)

Python Call Stacks (More useful to users):

Error Message Summary:

Error: Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 20941, but got 4575161256734962900. Please check input value. [Hint: Expected ids[i] < row_number, but received ids[i]:4575161256734962900 >= row_number:20941.] at (/paddle/paddle/fluid/operators/lookup_table_op.h:94) [operator < lookup_table > error]

nepeplwu commented 4 years ago

@1205469665 ，感谢反馈，这个问题的原因是因为paddle不支持多线程预测导致，可以尝试多进程的方式

1205469665 commented 4 years ago

这个不是预测也不行吗？分词和文本分类也不能多线程吗？

1205469665 commented 4 years ago

这是是不是还不能多进程加载文本分类的模型文件啊

1205469665 commented 4 years ago

也报错

1205469665 commented 4 years ago

@nepeplwu 这是是不是还不能多进程加载文本分类的模型文件啊,也报错

nepeplwu commented 4 years ago

@1205469665 多进程加载模型是支持的，麻烦贴下你的代码和报错堆栈，我们查下原因

1205469665 commented 4 years ago

@nepeplwu 我发现是因为我使用的是flask 的多进程模式不可以使用 gpu 加载改为cpu就正常了但是我训练的时候还需要使用gpu 啊 ,这个问题能解决吗？或者使用其他的多进程实现？

nepeplwu commented 4 years ago

不太理解你的意思，你启动一个flask的服务，应该是提供预测功能的，为什么会和训练有关系

1205469665 commented 4 years ago

@nepeplwu 我通过启动的flask 服务来调用开始训练的功能

ShenYuhan commented 4 years ago

请问您是自己写的flask服务，还是用paddlehub内置的的paddlehub-serving呢？需要多个模型部署的话，您这个场景是什么样的呢？

1205469665 commented 4 years ago

自己写的flask 服务我现在是自己的 python 服务上传数据训练模型, 预测数据, 还有分词啥的,

1205469665 commented 4 years ago

@ShenYuhan 看到了吗？

nepeplwu commented 4 years ago

@1205469665 你通过flask调用训练的接口，以及训练的代码是否可以发下？

1205469665 commented 4 years ago

@nepeplwu 代码暂时不能提供 , 就是写的一个http 的接口调用后面启动的文本分类的代码就是deom 里面的 ,现在使用flask 的进程模型用不了 gpu 模式

nepeplwu commented 4 years ago

@1205469665 ，你们多进程的实现方式是不是通过系统接口fork出新的进程来使用？如果是的话，建议试试将训练代码封装为一个脚本，然后通过os.system执行命令的方式来启动脚本，看看是否ok（每个脚本在启动前，需要将环境变量CUDA_VISIBLE_DEVICES设置为不同的GPU卡id）

1205469665 commented 4 years ago

@nepeplwu 是使用的flask 默认的多进程的方式实现的 app.run(processes=4) 看源代码使用的是 fork ,等明天我试试你说的脚本启动, 他是不能使用这种方式吗？

nepeplwu commented 4 years ago

可能和CUDA context无法在多个不同的进程之间共享导致的

Steffy-zxf commented 4 years ago

您好，此issue在近两周内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题。

PaddlePaddle / PaddleHub