NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.75k stars 2.13k forks source link

[trtexec] Device memory is insufficient to use tactic when running yolov5l on Jetson Orin NX(8GB) #3355

Closed shanchenjie closed 12 months ago

shanchenjie commented 1 year ago

Description

export onnx2trt failed when using Jetson Orin NX(8GB), info below is the compiling log: [09/26/2023-18:37:19] [W] [TRT] Tactic Device request: 4229MB Available: 2658MB. Device memory is insufficient to use tactic. [09/26/2023-18:37:19] [W] [TRT] Skipping tactic 13 due to insufficient memory on requested size of 4229 detected for tactic 0x0000000000000074. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [09/26/2023-18:37:20] [W] [TRT] Tactic Device request: 4226MB Available: 2658MB. Device memory is insufficient to use tactic. [09/26/2023-18:37:20] [W] [TRT] Skipping tactic 3 due to insufficient memory on requested size of 4226 detected for tactic 0x0000000000000004. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [09/26/2023-18:37:20] [W] [TRT] Tactic Device request: 4226MB Available: 2658MB. Device memory is insufficient to use tactic. [09/26/2023-18:37:20] [W] [TRT] Skipping tactic 7 due to insufficient memory on requested size of 4226 detected for tactic 0x000000000000003c. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [09/26/2023-18:39:50] [I] [TRT] Total Activation Memory: 7957761536 [09/26/2023-18:39:50] [I] [TRT] Detected 1 inputs and 7 output network tensors. [09/26/2023-18:39:52] [I] [TRT] Total Host Persistent Memory: 330528 [09/26/2023-18:39:52] [I] [TRT] Total Device Persistent Memory: 774144 [09/26/2023-18:39:52] [I] [TRT] Total Scratch Memory: 3264000 [09/26/2023-18:39:52] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 103 MiB, GPU 2228 MiB [09/26/2023-18:39:52] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 171 steps to complete. [09/26/2023-18:39:52] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 13.222ms to assign 7 blocks to 171 nodes requiring 34406400 bytes. [09/26/2023-18:39:52] [I] [TRT] Total Activation Memory: 34406400 [09/26/2023-18:39:53] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy. [09/26/2023-18:39:53] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. [09/26/2023-18:39:53] [W] [TRT] Check verbose logs for the list of affected weights. [09/26/2023-18:39:53] [W] [TRT] - 99 weights are affected by this issue: Detected subnormal FP16 values. [09/26/2023-18:39:53] [W] [TRT] - 4 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. [09/26/2023-18:39:53] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +89, GPU +128, now: CPU 89, GPU 128 (MiB) [09/26/2023-18:39:53] [E] Saving engine to file failed. [09/26/2023-18:39:53] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=yolov5l.onnx --buildOnly --saveEngine=yolov5l_fp16_1batch.trt --fp16

Environment

Jetpack Version: jetpack 5.1.1

NVIDIA GPU: Jetson Orin NX

Operating System:

Python Version (if applicable): python 3.8.10

Steps To Reproduce

shanchenjie commented 1 year ago

i try to compiling the same model with Xavier NX with the same jetpack version,it works well. And i found it will cost diffierent memory even if compile with the same yolov5l model. memory cost max when using Orin NX: 09-26-2023 18:27:47 RAM 5902/7337MB (lfb 5x4MB) SWAP 3/7764MB (cached 0MB) memory cost max when using Xavier NX: 09-26-2023 10:45:28 RAM 5156/6857MB (lfb 66x1MB) SWAP 45/3428MB (cached 0MB)

zerollzeng commented 1 year ago

[09/26/2023-18:37:20] [W] [TRT] Tactic Device request: 4226MB Available: 2658MB. Device memory is insufficient to use tactic.

This is is a warning, so should be good. It just mean the device memory is insufficient for this tactic. TensorRT will use other tactics.

[09/26/2023-18:39:53] [E] Saving engine to file failed.

This is why it fails, do you have the write privilege to the directory?

shanchenjie commented 1 year ago

[09/26/2023-18:37:20] [W] [TRT] Tactic Device request: 4226MB Available: 2658MB. Device memory is insufficient to use tactic.

This is is a warning, so should be good. It just mean the device memory is insufficient for this tactic. TensorRT will use other tactics.

[09/26/2023-18:39:53] [E] Saving engine to file failed.

This is why it fails, do you have the write privilege to the directory?

Thanks for your reply.

  1. And yes, i use root user during the whole process.
  2. I try with some small model(resnet50,yolov5s,yolov5m), no error happens. Trt engine can be exported correctly.
  3. But with unet, yolov5l, yolov5x, The same model same command, but fail with Orin NX(xavier nx success).
  4. It cost different memory to transform the trt engine with orin nx and xavier nx, i'm not sure if it is the reason.

memory cost max when using Orin NX: 09-26-2023 18:27:47 RAM 5902/7337MB (lfb 5x4MB) SWAP 3/7764MB (cached 0MB) memory cost max when using Xavier NX: 09-26-2023 10:45:28 RAM 5156/6857MB (lfb 66x1MB) SWAP 45/3428MB (cached 0MB)

zerollzeng commented 1 year ago

Turn on the TRT verbose log, I think we can know more detailed reason why it fails.

ttyio commented 12 months ago

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!

LINTAO5835 commented 12 months ago

Did you finally solve it? I had the same situation!

[2023/09/26-18:37:20] [W] [TRT] 战术设备请求:4226MB 可用:2658MB。设备内存不足,无法使用策略。

这是一个警告,所以应该是好的。这只是意味着设备内存不足以满足这种策略。TensorRT 将使用其他策略。

[2023/09/26-18:39:53] [E] 将引擎保存到文件失败。

这就是它失败的原因,你有目录的写入权限吗?

感谢您的回复。

  1. 是的,我在整个过程中都使用 root 用户。
  2. 我尝试使用一些小模型(resnet50,yolov5s,yolov5m),没有发生错误。Trt 引擎可以正确导出。
  3. 但是使用 unet、yolov5l、yolov5x、相同的模型相同的命令,但使用 Orin NX 失败(xavier nx 成功)。
  4. 使用 orin nx 和 xavier nx 转换 trt 引擎需要不同的内存,我不确定这是否是原因。

使用 Orin NX 时内存成本最大:09-26-2023 18:27:47 内存 5902/7337MB (lfb 5x4MB) 交换 3/7764MB(缓存 0MB) 使用 Xavier NX 时内存成本最大:09-26-2023 10:45:28 内存 5156/6857MB (lfb 66x1MB) 交换 45/3428MB(缓存 0MB)

CZG0712 commented 3 months ago

@shanchenjie I also encountered the same error, may I ask you to solve it?