Closed MarsChange closed 6 months ago
我也有相同的問題,
也有測試了在Jetson orin上編譯, 詳情看這裡 https://github.com/lappun/PaddleOCR_testing/blob/master/setup.ipynb
但還是一樣
以下是run_check()的結果 (跑了七分半鐘才報錯)
Running verify PaddlePaddle program ...
I0406 03:29:15.610915 429179 program_interpreter.cc:212] New Executor is Running.
W0406 03:29:15.611184 429179 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.7, Driver API Version: 11.4, Runtime API Version: 11.4
W0406 03:29:15.616206 429179 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
I0406 03:36:26.023401 429179 interpreter_util.cc:624] Standalone Executor is Used.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 273, in run_check
_run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 151, in _run_static_single
exe.run(
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/executor.py", line 1746, in run
res = self._run_impl(
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/executor.py", line 1952, in _run_impl
ret = new_exe.run(
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/executor.py", line 831, in run
tensors = self._new_exe.run(
OSError: In user code:
File "<string>", line 1, in <module>
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 273, in run_check
_run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 135, in _run_static_single
input, out, weight = _simple_network()
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/utils/install_check.py", line 37, in _simple_network
linear_out = paddle.nn.functional.linear(x=input, weight=weight, bias=bias)
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/nn/functional/common.py", line 1985, in linear
helper.append_op(
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/layer_helper.py", line 44, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/framework.py", line 4467, in append_op
op = Operator(
File "/home/paddle/.local/lib/python3.8/site-packages/paddle/base/framework.py", line 3016, in __init__
for frame in traceback.extract_stack():
ExternalError: CUBLAS error(13).
[Hint: 'CUBLAS_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons. To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed. ] (at /paddle/build/Paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:41)
[operator < matmul_v2_grad > error]
以下是test ctc_loss的結果 (立刻成功)
W0406 03:46:40.800690 429267 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.7, Driver API Version: 11.4, Runtime API Version: 11.4
W0406 03:46:40.806195 429267 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
Tensor(shape=[2], dtype=float32, place=Place(gpu:0), stop_gradient=True,
[3.91798496, 2.90765190])
Tensor(shape=[], dtype=float32, place=Place(gpu:0), stop_gradient=True,
1.13760614)
以下是test mutmul的結果 (跑了七分半鐘才成功) W0406 03:48:13.081670 429350 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.7, Driver API Version: 11.4, Runtime API Version: 11.4 W0406 03:48:13.086966 429350 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4. [] [10] [10, 5] [10, 5, 5] [10, 3, 5, 5]
@MarsChange Jetson 的预测库在编译时会裁剪与推理无关的 kernel,从而减少预测库的体积,参见 #53369 。 此外,matmul_grad 通常是用于计算矩阵乘法操作的梯度,在训练过程中,通过计算损失函数对模型参数的梯度来更新参数,从而使模型逐渐优化;但在推理过程中,也就是模型已经训练好后用于进行预测的过程中,并不需要计算梯度。因此,在推理过程中,通常不会使用到matmul_grad这样的操作,而是使用前向传播的结果直接进行预测,所以这边报错并不会影响使用预测库的使用。
@lappun 1、paddle.utils.run_check() 的报错可参见这部分解释
@MarsChange Jetson 的预测库在编译时会裁剪与推理无关的 kernel,从而减少预测库的体积,参见 #53369 。 此外,matmul_grad 通常是用于计算矩阵乘法操作的梯度,在训练过程中,通过计算损失函数对模型参数的梯度来更新参数,从而使模型逐渐优化;但在推理过程中,也就是模型已经训练好后用于进行预测的过程中,并不需要计算梯度。因此,在推理过程中,通常不会使用到matmul_grad这样的操作,而是使用前向传播的结果直接进行预测,所以这边报错并不会影响使用预测库的使用。
2、test_matmul 的问题也做了验证,其中需要注意 API "paddle.device.cuda.stream_guard" is deprecated since 2.5.0, and will be removed in future versions. Please use "paddle.device.stream_guard" instead.
# test_matmul.py
import time
import paddle
paddle.device.set_device('gpu')
s = paddle.device.Stream()
start_time = time.time()
print("start_time:", start_time)
# vector * vector
x = paddle.rand([10])
y = paddle.rand([10])
with paddle.device.stream_guard(s):
z = paddle.matmul(x, y)
print(z.shape)
# matrix * vector
x = paddle.rand([10, 5])
y = paddle.rand([5])
with paddle.device.stream_guard(s):
z = paddle.matmul(x, y)
print(z.shape)
# batched matrix * broadcasted vector
x = paddle.rand([10, 5, 2])
y = paddle.rand([2])
with paddle.device.stream_guard(s):
z = paddle.matmul(x, y)
print(z.shape)
# batched matrix * batched matrix
x = paddle.rand([10, 5, 2])
y = paddle.rand([10, 2, 5])
with paddle.device.stream_guard(s):
z = paddle.matmul(x, y)
print(z.shape)
# batched matrix * broadcasted matrix
x = paddle.rand([10, 1, 5, 2])
y = paddle.rand([1, 3, 2, 5])
with paddle.device.stream_guard(s):
z = paddle.matmul(x, y)
print(z.shape)
end_time = time.time()
print("end_time:", end_time)
total_time = end_time - start_time
print("Total run_time: {:.2f} seconds".format(total_time))
针对修改后的单侧,基于我这边的机器(Volta 架构)执行正常,可能需要再看一下是否为机器架构影响:
root@paddle-jp502-2:/home/paddle/data/yubaoku/issue_debug# python test_matmul.py
start_time: 1712493604.1947494
W0407 20:40:04.195492 1831309 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.2, Driver API Version: 11.4, Runtime API Version: 11.4
W0407 20:40:04.208076 1831309 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
[]
[10]
[10, 5]
[10, 5, 5]
[10, 3, 5, 5]
end_time: 1712493605.480865
Total run_time: 1.29 seconds
bug描述 Describe the Bug
Here is the code
Here is the bug.
其他补充信息 Additional Supplementary Information
No response