3090机器上两张GPU卡并发出现core dump

环境

【FastDeploy版本】：说明具体的版本，如 1.0.3-gpu-cuda11.4-trt8.4-21.10
【系统平台】: Linux x64(Ubuntu 18.04)
【硬件】：说明具体硬件型号，如 NVIDIA GeForce TRX 3090， CUDA 11.4
【编译语言】： C++ / Python(3.6）

问题日志及出现问题的操作流程

【流程】

gpu服务有两张卡，-gpus=all,2. 例子： docker run -it -d --name test_im_gpu --gpus=all -p9800:8000 -p9801:8001 -p9802:8002 -v /workspace/triton/customer_im/gpu:/test_models harbor.prod.yxit.cc/sdc-ai/arges:argesdeploy-1.0.1 /bin/bash
单个请求，正常【日志】
onnxruntime方式： 2023-08-15 16:11:52.762890665 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUBLAS failure 14: CUBLAS_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop);

2023-08-15 16:11:52.762954041 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running MatMul node. Name:'p2o.MatMul.14' Status Message: CUBLAS error executing cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop)

2023-08-15 16:11:52.763008176 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cudaEventRecord(current_deferred_release_event, static_cast(GetComputeStream()));

[ERROR] fastdeploy/runtime/backends/ort/ort_backend.cc(365)::Infer Failed to Infer: Non-zero status code returned while running MatMul node. Name:'p2o.MatMul.14' Status Message: CUBLAS error executing cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop)

[WARNING] fastdeploy/runtime/runtime.cc(243)::GetOutputTensor The output name [tmp_59] don't exist. 2023-08-15 16:11:52.773263270 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cudaEventCreate(&current_deferred_release_event, 0x02); [ERROR] fastdeploy/runtime/backends/ort/ort_backend.cc(365)::Infer Failed to Infer: CUDA error executing cudaEventCreate(&current_deferred_release_event, cudaEventDisableTiming)

tensorrt [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(239)::log 1: [runner.cpp::execute::718] Error Code 1: Myelin (Final synchronize failed (700)) [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(348)::Infer Failed to Infer with TensorRT. [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(479)::SetInputs Error occurs while copy memory from CPU to GPU. Signal (6) received. [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(479)::SetInputs Error occurs while copy memory from CPU to GPU. Signal (6) received.

PaddlePaddle / FastDeploy

3090机器上两张GPU卡并发出现core dump #2168

环境

问题日志及出现问题的操作流程