PaddlePaddle / PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互,请耐心等待】
https://www.paddlepaddle.org.cn/hub
Apache License 2.0
12.69k stars 2.08k forks source link

文本分类 或者分词不能同时调用 #512

Closed 1205469665 closed 4 years ago

1205469665 commented 4 years ago
  1. PaddleHub 1.5.4 PaddlePaddle 1.7.1post107
  2. linux python 3.69

使用python flask 框架编写服务 文本分类服务 , 分词 在一定的时间内多个线程 多台机器只调用 文本分类 或者分词 不会报错

但是如果多个线程 有不同的线程调用文本分类 或者分类 就是两个接口都会调用 就会报错 服务还会直接挂掉

此错误会导致服务直接挂掉 W0410 17:05:46.529703 22852 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly W0410 17:05:46.529724 22852 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle W0410 17:05:46.529728 22852 init.cc:214] The detail failure signal is:

W0410 17:05:46.529732 22852 init.cc:217] Aborted at 1586509546 (unix time) try "date -d @1586509546" if you are using GNU date W0410 17:05:46.532021 22852 init.cc:217] PC: @ 0x0 (unknown) W0410 17:05:46.532105 22852 init.cc:217] SIGSEGV (@0x9) received by PID 22469 (TID 0x7f29df7fe700) from PID 9; stack trace: W0410 17:05:46.534162 22852 init.cc:217] @ 0x7f2c44f2a5f0 (unknown) W0410 17:05:46.536512 22852 init.cc:217] @ 0x7f2bf1bc2fef std::_Sp_counted_base<>::_M_release() W0410 17:05:46.538888 22852 init.cc:217] @ 0x7f2bf1bc48bc std::vector<>::operator=() W0410 17:05:46.542672 22852 init.cc:217] @ 0x7f2bf412036d paddle::operators::FetchOp::RunImpl() W0410 17:05:46.545567 22852 init.cc:217] @ 0x7f2bf4827ed0 paddle::framework::OperatorBase::Run() W0410 17:05:46.549872 22852 init.cc:217] @ 0x7f2bf1f266c6 paddle::framework::Executor::RunPreparedContext() W0410 17:05:46.552654 22852 init.cc:217] @ 0x7f2bf1f29d8b paddle::framework::Executor::Run() W0410 17:05:46.553498 22852 init.cc:217] @ 0x7f2bf1babf8e _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE108_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4FUNES10 W0410 17:05:46.555279 22852 init.cc:217] @ 0x7f2bf1c00461 pybind11::cpp_function::dispatcher() W0410 17:05:46.555611 22852 init.cc:217] @ 0x55d258db1c54 _PyCFunction_FastCallDict W0410 17:05:46.555817 22852 init.cc:217] @ 0x55d258e39c0e call_function W0410 17:05:46.556154 22852 init.cc:217] @ 0x55d258e5c75a _PyEval_EvalFrameDefault W0410 17:05:46.556355 22852 init.cc:217] @ 0x55d258e32e66 _PyEval_EvalCodeWithName W0410 17:05:46.556551 22852 init.cc:217] @ 0x55d258e33ed6 fast_function W0410 17:05:46.556752 22852 init.cc:217] @ 0x55d258e39b95 call_function W0410 17:05:46.557062 22852 init.cc:217] @ 0x55d258e5d51c _PyEval_EvalFrameDefault W0410 17:05:46.557279 22852 init.cc:217] @ 0x55d258e32e66 _PyEval_EvalCodeWithName W0410 17:05:46.557477 22852 init.cc:217] @ 0x55d258e33ed6 fast_function W0410 17:05:46.557678 22852 init.cc:217] @ 0x55d258e39b95 call_function W0410 17:05:46.557986 22852 init.cc:217] @ 0x55d258e5d51c _PyEval_EvalFrameDefault W0410 17:05:46.558207 22852 init.cc:217] @ 0x55d258e32e66 _PyEval_EvalCodeWithName W0410 17:05:46.558408 22852 init.cc:217] @ 0x55d258e33ed6 fast_function W0410 17:05:46.558609 22852 init.cc:217] @ 0x55d258e39b95 call_function W0410 17:05:46.558917 22852 init.cc:217] @ 0x55d258e5d51c _PyEval_EvalFrameDefault W0410 17:05:46.559137 22852 init.cc:217] @ 0x55d258e32e66 _PyEval_EvalCodeWithName W0410 17:05:46.559337 22852 init.cc:217] @ 0x55d258e33ed6 fast_function W0410 17:05:46.559538 22852 init.cc:217] @ 0x55d258e39b95 call_function W0410 17:05:46.559847 22852 init.cc:217] @ 0x55d258e5d51c _PyEval_EvalFrameDefault W0410 17:05:46.560042 22852 init.cc:217] @ 0x55d258e32e66 _PyEval_EvalCodeWithName W0410 17:05:46.560253 22852 init.cc:217] @ 0x55d258e33e73 fast_function W0410 17:05:46.560456 22852 init.cc:217] @ 0x55d258e39b95 call_function W0410 17:05:46.560765 22852 init.cc:217] @ 0x55d258e5c75a _PyEval_EvalFrameDefault ^Cbash: 行 1: 22469 段错误 (吐核)env "JETBRAINS_REMOTE_RUN"="1" "LIBRARY_ROOTS"="C:/Users/Administrator/.PyCharm2018.3/system/remote_sources/1956375958/732364776;C:/Users/Administrator/.PyCharm2018.3/system/remote_sources/1956375958/-1465301300;C:/Users/Administrator/.PyCharm2018.3/system/python_stubs/1956375958;F:/开发软件/PyCharm 2018.3.5/helpers/python-skeletons;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/stdlib/3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/stdlib/2and3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/third_party/3;F:/开发软件/PyCharm 2018.3.5/helpers/typeshed/third_party/2and3" "PYDEVD_LOAD_VALUES_ASYNC"="True" "PYTHONPATH"="/home/ssh-project/pwlp-judged:/root/.pycharm_helpers/pycharm_matplotlib_backend:/root/.pycharm_helpers/third_party/thriftpy:/root/.pycharm_helpers/pydev:C:/Users/Administrator/.PyCharm2018.3/system/cythonExtensions:/home/ssh-project/pwlp-judged" "PYTHONIOENCODING"="UTF-8" "PYTHONDONTWRITEBYTECODE"="1" "IPYTHONENABLE"="True" "PYCHARM_MATPLOTLIB_PORT"="51810" "PYCHARM_HOSTED"="1" "PYTHONUNBUFFERED"="1" "IDE_PROJECT_ROOTS"="/home/ssh-project/pwlp-judged" '/home/anaconda3/envs/cail/bin/python3.6' '-u' '/root/.pycharm_helpers/pydev/pydevd.py' '--multiproc' '--qt-support=auto' '--client' '0.0.0.0' '--port' '42281' '--file' '/home/ssh-project/pwlp-judged/setup.py'

此错误导致程序不能正常调用

File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request return self.view_functionsrule.endpoint File "/home/ssh-project/pwlp-judged/controller/paddle_nlp_controller.py", line 71, in initCustomIdfCorpus utils.init_custom_idf_corpus(document_list,allowPOS=('ns', 'n', 'vn', 'v', 'a')) File "/home/ssh-project/pwlp-judged/paddlenlp/utils/utils.py", line 67, in init_custom_idf_corpus default_tfidf.init_custom_idf_corpus(sentence_list,allowPOS) File "/home/ssh-project/pwlp-judged/paddlenlp/extractkeyword/tfidf.py", line 162, in init_custom_idf_corpus words = self.seg_sentence(allowPOS, sentence) File "/home/ssh-project/pwlp-judged/paddlenlp/extractkeyword/tfidf.py", line 122, in seg_sentence wt = self.tokenizer.lexical_analysis(sentence) File "/home/ssh-project/pwlp-judged/paddlenlp/utils/utils.py", line 32, in lexical_analysis return lac.lexical_analysis(data={'text': document}, user_dict=None) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddlehub/module/module.py", line 543, in call return_numpy=False) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 783, in run six.reraise(*sys.exc_info()) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/six.py", line 696, in reraise raise value File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 778, in run use_program_cache=use_program_cache) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 831, in _run_impl use_program_cache=use_program_cache) File "/home/anaconda3/envs/cail/lib/python3.6/site-packages/paddle/fluid/executor.py", line 905, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 2 paddle::operators::LookupTableKernel::Compute(paddle::framework::ExecutionContext const&) const 3 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::LookupTableKernel, paddle::operators::LookupTableKernel, paddle::operators::LookupTableKernel >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 6 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 7 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext, paddle::framework::Scope, bool, bool, bool) 8 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope, int, bool, bool, std::vector<std::string, std::allocator > const&, bool, bool)


Python Call Stacks (More useful to users):


Error Message Summary:

Error: Variable value (input) of OP(fluid.layers.embedding) expected >= 0 and < 20941, but got 4575161256734962900. Please check input value. [Hint: Expected ids[i] < row_number, but received ids[i]:4575161256734962900 >= row_number:20941.] at (/paddle/paddle/fluid/operators/lookup_table_op.h:94) [operator < lookup_table > error]

nepeplwu commented 4 years ago

@1205469665 ,感谢反馈,这个问题的原因是因为paddle不支持多线程预测导致,可以尝试多进程的方式

1205469665 commented 4 years ago

这个不是预测也不行吗?分词和文本分类也不能多线程吗?

1205469665 commented 4 years ago

这是是不是还不能 多进程加载文本分类的模型文件啊

1205469665 commented 4 years ago

也报错

1205469665 commented 4 years ago

@nepeplwu 这是是不是还不能 多进程加载文本分类的模型文件啊,也报错

nepeplwu commented 4 years ago

@1205469665 多进程加载模型是支持的,麻烦贴下你的代码和报错堆栈,我们查下原因

1205469665 commented 4 years ago

@nepeplwu 我发现是因为我使用 的是flask 的多进程模式 不可以 使用 gpu 加载 改为cpu就正常了 但是我训练的时候还需要使用gpu 啊 ,这个问题能解决吗?或者使用 其他的多进程实现?

nepeplwu commented 4 years ago

不太理解你的意思,你启动一个flask的服务,应该是提供预测功能的,为什么会和训练有关系

1205469665 commented 4 years ago

@nepeplwu 我通过启动的flask 服务来调用开始训练的功能

ShenYuhan commented 4 years ago

请问您是自己写的flask服务,还是用paddlehub内置的的paddlehub-serving呢?需要多个模型部署的话,您这个场景是什么样的呢?

1205469665 commented 4 years ago

自己写的flask 服务 我现在是自己的 python 服务 上传数据 训练模型, 预测数据, 还有分词啥的,

1205469665 commented 4 years ago

@ShenYuhan 看到了吗?

nepeplwu commented 4 years ago

@1205469665 你通过flask调用训练的接口,以及训练的代码是否可以发下?

1205469665 commented 4 years ago

@nepeplwu 代码暂时不能提供 , 就是写的 一个http 的接口 调用 后面启动的 文本分类的代码就是deom 里面的 ,现在使用flask 的进程模型 用不了 gpu 模式

nepeplwu commented 4 years ago

@1205469665 ,你们多进程的实现方式是不是通过系统接口fork出新的进程来使用?如果是的话,建议试试将训练代码封装为一个脚本,然后通过os.system执行命令的方式来启动脚本,看看是否ok(每个脚本在启动前,需要将环境变量CUDA_VISIBLE_DEVICES设置为不同的GPU卡id)

1205469665 commented 4 years ago

@nepeplwu 是使用的flask 默认的多进程的方式实现的 app.run(processes=4) 看源代码使用的是 fork ,等明天我试试你说的脚本启动, 他是不能使用这种方式吗?

nepeplwu commented 4 years ago

可能和CUDA context无法在多个不同的进程之间共享导致的

Steffy-zxf commented 4 years ago

您好,此issue在近两周内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题。