PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
https://www.paddlepaddle.org.cn/fastdeploy
Apache License 2.0
2.95k stars 460 forks source link

3090机器上两张GPU卡并发出现core dump #2168

Open WeiboXu opened 1 year ago

WeiboXu commented 1 year ago

温馨提示:根据社区不完全统计,按照模板提问,可以加快回复和解决问题的速度


环境

问题日志及出现问题的操作流程

【流程】

  1. gpu服务有两张卡,-gpus=all,2. 例子: docker run -it -d --name test_im_gpu --gpus=all -p9800:8000 -p9801:8001 -p9802:8002 -v /workspace/triton/customer_im/gpu:/test_models harbor.prod.yxit.cc/sdc-ai/arges:argesdeploy-1.0.1 /bin/bash
  2. 单个请求,正常 【日志】
  3. onnxruntime方式: 2023-08-15 16:11:52.762890665 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUBLAS failure 14: CUBLAS_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop);

2023-08-15 16:11:52.762954041 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running MatMul node. Name:'p2o.MatMul.14' Status Message: CUBLAS error executing cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop)

2023-08-15 16:11:52.763008176 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cudaEventRecord(current_deferred_release_event, static_cast(GetComputeStream()));

[ERROR] fastdeploy/runtime/backends/ort/ort_backend.cc(365)::Infer Failed to Infer: Non-zero status code returned while running MatMul node. Name:'p2o.MatMul.14' Status Message: CUBLAS error executing cublasGemmHelper( Base::CublasHandle(), transB, transA, static_cast(helper.N()), static_cast(helper.M()), static_cast(helper.K()), &alpha, reinterpret_cast<const CudaT>(right_X->template Data()), ldb, reinterpret_cast<const CudaT>(left_X->template Data()), lda, &zero, reinterpret_cast<CudaT*>(Y->template MutableData()), ldc, device_prop)

[WARNING] fastdeploy/runtime/runtime.cc(243)::GetOutputTensor The output name [tmp_59] don't exist. 2023-08-15 16:11:52.773263270 [E:onnxruntime:, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=1d4d4ff6436c ; expr=cudaEventCreate(&current_deferred_release_event, 0x02); [ERROR] fastdeploy/runtime/backends/ort/ort_backend.cc(365)::Infer Failed to Infer: CUDA error executing cudaEventCreate(&current_deferred_release_event, cudaEventDisableTiming)

  1. tensorrt [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(239)::log 1: [runner.cpp::execute::718] Error Code 1: Myelin (Final synchronize failed (700)) [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(348)::Infer Failed to Infer with TensorRT. [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(479)::SetInputs Error occurs while copy memory from CPU to GPU. Signal (6) received. [ERROR] fastdeploy/runtime/backends/tensorrt/trt_backend.cc(479)::SetInputs Error occurs while copy memory from CPU to GPU. Signal (6) received.
WeiboXu commented 1 year ago
  1. 单个请求正常
  2. 两个请求并发就会core dump
  3. 若设CUDA_VISIBLE_DEVICES=0,则并发请求也正常。
  4. 两种方式调用onnxruntime和tensorrt,均会出现core dump