PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.24k stars 5.58k forks source link

[paddle.nn.functional.ctc_loss] FatalError: `Segmentation fault` is detected by the operating system. #66497

Open KnightGOKU opened 3 months ago

KnightGOKU commented 3 months ago

bug描述 Describe the Bug

I encountered an issue when using paddle.nn.functional.ctc_loss with Python 3.12. The following code crashed and raised a Segmentation fault :

import paddle

def func():
    tensor1 = paddle.rand([5, 3, 15], dtype='float64')
    tensor2 =paddle.to_tensor(
       [[-879 , -11, -714 , -202 ],
        [-16, -312 , -93  , -494 ],
        [-919 , -1281, -1495, -15]], dtype='int32')

    res = paddle.nn.functional.ctc_loss(
        log_probs=tensor1,
        labels=tensor2,
        input_lengths=paddle.to_tensor([5, 5, 5]),
        label_lengths=paddle.to_tensor([4, 2, 2]),
        blank= 14,
        reduction="mean",

    )

    return res

result = func()
print(result)

The error message is as follows:

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::eager_api_warpctc(_object*, _object*, _object*)
1   warpctc_ad_func(paddle::Tensor const&, paddle::Tensor const&, paddle::optional<paddle::Tensor> const&, paddle::optional<paddle::Tensor> const&, int, bool)
2   paddle::experimental::warpctc_intermediate(paddle::Tensor const&, paddle::Tensor const&, paddle::optional<paddle::Tensor> const&, paddle::optional<paddle::Tensor> const&, int, bool)
3   void phi::WarpctcKernel<double, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, paddle::optional<phi::DenseTensor> const&, paddle::optional<phi::DenseTensor> const&, int, bool, phi::DenseTensor*, phi::DenseTensor*)
4   phi::WarpCTCFunctor<phi::CPUContext, double>::operator()(phi::CPUContext const&, double const*, double*, int const*, int const*, int const*, unsigned long, unsigned long, unsigned long, double*)
5   compute_ctc_loss_double
6   CpuCTC<double>::cost_and_grad(double const*, double*, double*, int const*, int const*, int const*)
7   CpuCTC<double>::compute_betas_and_grad(double*, double const*, double, int, int, int, int const*, int const*, int const*, double*, double*, double*)

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1721874943 (unix time) try "date -d @1721874943" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0xfffffffe93671f30) received by PID 4006959 (TID 0x7f6a67274280) from PID 18446744071887593264 ***]

Segmentation fault (core dumped)

The code was tested in Paddle version 3.0.0b1-cpu. Interestingly, it works well when I switch to Python version 3.10.14.

其他补充信息 Additional Supplementary Information

No response

ShuangLyu commented 2 months ago

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/hardware_support/npu/install_cn.html 按照上述流程进行安装,执行到下面这一步的时候,也报了跟楼主一样的错误

飞桨基础健康检查

python -c "import paddle; paddle.utils.run_check()"

python==3.10.0

错误日志:

paddle.utils.run_check() Running verify PaddlePaddle program ... I0829 16:44:22.169772 804539 program_interpreter.cc:243] New Executor is Running.


C++ Traceback (most recent call last):

0 paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator > const&, bool) 1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 2 paddle::framework::ProgramInterpreter::Run(std::vector<std::string, std::allocator > const&, bool, bool, bool, bool) 3 paddle::framework::ProgramInterpreter::Build(std::vector<std::string, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocator >, bool) 4 paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less, std::allocator > const&, std::vector<paddle::framework::OpFuncNode, std::allocator >, paddle::framework::VariableScope, paddle::framework::interpreter::ExecutionConfig const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, std::vector<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)>, std::allocator<std::function<void (paddle::framework::OperatorBase, paddle::framework::Scope)> > > const&, bool, bool) 5 void custom_kernel::MatmulKernel<float, phi::CustomContext>(phi::CustomContext const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor) 6 aclnnMatmul 7 InitL2Phase2Context(char const, aclOpExecutor) 8 GetOpExecCacheFromExecutor(aclOpExecutor*)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1724921062 (unix time) try "date -d @1724921062" if you are using GNU date ] [SignalInfo: SIGSEGV (@0xc46bb) received by PID 804539 (TID 0xffff433e59c0) from PID 804539 ]

段错误 (核心已转储)