PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.13k stars 5.55k forks source link

FatalError: `Termination signal` is detected by the operating system #46870

Closed Birdylx closed 1 year ago

Birdylx commented 1 year ago

bug描述 Describe the Bug

采用多进程时,paddle抛出异常,但是程序可以正常运行,这个异常虽然不会影响程序结果,但是为什么会出现这样的情况呢?

最小复现代码:

import multiprocessing
from tqdm import tqdm
import paddle

class custom_dataset(paddle.io.Dataset):
    def __init__(self):
        super().__init__()

    def func(self):
        workers = min(8, multiprocessing.cpu_count())
        data = [i for i in range(5)]
        with multiprocessing.Pool(workers) as p:
            res = list(tqdm(p.imap(f, data), total=len(data)))
        print(res)

def f(x):
    return x*x

if __name__ == '__main__':
    dataset = custom_dataset()
    dataset.func()

报错信息:

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Termination signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1665408425 (unix time) try "date -d @1665408425" if you are using GNU date ***]
  [SignalInfo: *** SIGTERM (@0x138f) received by PID 5021 (TID 0x7f8e236ed700) from PID 5007 ***]

其他补充信息 Additional Supplementary Information

No response

paddle-bot[bot] commented 1 year ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

heavengate commented 1 year ago

你这个应该跟Dataset没有关系,这个继承Dataset类并没有实现Dataset类的方法,只是实现了一个func,然后调用func,继承任何类这样写都能跑,应该是跟paddle无关的

Birdylx commented 1 year ago

@heavengate ,是paddle全局注册的原因,multiprocess预期内的行为被paddle捕获https://github.com/PaddlePaddle/Paddle/issues/36281

zhao-sy commented 1 year ago

抛出这个错误之后程序可以继续运行吗

lll123github commented 1 year ago

同问,似乎异常被python接收之后程序终止了?(.../PaddleSpeech/examples/aishell3/tts3)

nikithakriz commented 6 months ago

I had similar issue while using paddleocr along with multiprocessing


C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1710392015 (unix time) try "date -d @1710392015" if you are using GNU date ] [SignalInfo: SIGTERM (@0x3e90000cedc) received by PID 53174 (TID 0x7ea416a6db80) from PID 52956 ]


C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::~AnalysisPredictor() 1 paddle::AnalysisPredictor::~AnalysisPredictor() 2 std::_Sp_counted_base<(gnu_cxx::_Lock_policy)2>::_M_release() 3 paddle::framework::OpDesc::~OpDesc() 4 std::_Hashtable<std::string, std::pair<std::string const, paddle::variant<paddle::blank, int, float, std::string, std::vector<int, std::allocator >, std::vector<float, std::allocator >, std::vector<std::string, std::allocator >, bool, std::vector<bool, std::allocator >, paddle::framework::BlockDesc, long, std::vector<paddle::framework::BlockDesc, std::allocator<paddle::framework::BlockDesc> >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, paddle::framework::VarDesc, std::vector<paddle::framework::VarDesc, std::allocator<paddle::framework::VarDesc> >, double, paddle::experimental::ScalarBase, std::vector<paddle::experimental::ScalarBase, std::allocator<paddle::experimental::ScalarBase > >, pir::Block, std::vector<pir::Value, std::allocator > > >, std::allocator<std::pair<std::string const, paddle::variant<paddle::blank, int, float, std::string, std::vector<int, std::allocator >, std::vector<float, std::allocator >, std::vector<std::string, std::allocator >, bool, std::vector<bool, std::allocator >, paddle::framework::BlockDesc, long, std::vector<paddle::framework::BlockDesc, std::allocator<paddle::framework::BlockDesc> >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, paddle::framework::VarDesc, std::vector<paddle::framework::VarDesc, std::allocator<paddle::framework::VarDesc> >, double, paddle::experimental::ScalarBase, std::vector<paddle::experimental::ScalarBase, std::allocator<paddle::experimental::ScalarBase > >, pir::Block, std::vector<pir::Value, std::allocator > > > >, std::detail::_Select1st, std::equal_to, std::hash, std::detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() 5 std::vector<std::string, std::allocator >::~vector()


Error Message Summary:

FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1710392015 (unix time) try "date -d @1710392015" if you are using GNU date ] [SignalInfo: SIGTERM (@0x3e90000cedc) received by PID 53176 (TID 0x72ef72873b80) from PID 52956 ]

aashishpokharel commented 1 week ago

I had similar issue while using paddleocr along with multiprocessing

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.

Error Message Summary:

FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1710392015 (unix time) try "date -d @1710392015" if you are using GNU date ] [SignalInfo: SIGTERM (@0x3e90000cedc) received by PID 53174 (TID 0x7ea416a6db80) from PID 52956 ]

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::~AnalysisPredictor() 1 paddle::AnalysisPredictor::~AnalysisPredictor() 2 std::_Sp_counted_base<(gnu_cxx::_Lock_policy)2>::_M_release() 3 paddle::framework::OpDesc::~OpDesc() 4 std::_Hashtable<std::string, std::pair<std::string const, paddle::variant<paddle::blank, int, float, std::string, std::vector<int, std::allocator >, std::vector<float, std::allocator >, std::vector<std::string, std::allocator >, bool, std::vector<bool, std::allocator >, paddle::framework::BlockDesc, long, std::vector<paddle::framework::BlockDesc, std::allocatorpaddle::framework::BlockDesc >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, paddle::framework::VarDesc, std::vector<paddle::framework::VarDesc, std::allocatorpaddle::framework::VarDesc >, double, paddle::experimental::ScalarBasepaddle::Tensor, std::vector<paddle::experimental::ScalarBasepaddle::Tensor, std::allocator >, pir::Block, std::vector<pir::Value, std::allocatorpir::Value > > >, std::allocator<std::pair<std::string const, paddle::variant<paddle::blank, int, float, std::string, std::vector<int, std::allocator >, std::vector<float, std::allocator >, std::vector<std::string, std::allocator >, bool, std::vector<bool, std::allocator >, paddle::framework::BlockDesc, long, std::vector<paddle::framework::BlockDesc, std::allocatorpaddle::framework::BlockDesc >, std::vector<long, std::allocator >, std::vector<double, std::allocator >, paddle::framework::VarDesc, std::vector<paddle::framework::VarDesc, std::allocatorpaddle::framework::VarDesc >, double, paddle::experimental::ScalarBasepaddle::Tensor, std::vector<paddle::experimental::ScalarBasepaddle::Tensor, std::allocator >, pir::Block, std::vector<pir::Value, std::allocatorpir::Value > > > >, std::detail::_Select1st, std::equal_to, std::hash, std::detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() 5 std::vector<std::string, std::allocator >::~vector()

Error Message Summary:

FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1710392015 (unix time) try "date -d @1710392015" if you are using GNU date ] [SignalInfo: SIGTERM (@0x3e90000cedc) received by PID 53176 (TID 0x72ef72873b80) from PID 52956 ]

has this been solved?