Closed shifsa closed 1 year ago
The error is that your device ran out of GPU memory when loading this model and running inference. Using CUDA_LAUNCH_BLOCKING=1 (see here), we force CUDA to load launch blocks in sequence rather in parallel which is likely lowering your memory high watermark enough to run. It will resolve the error for you at the cost of launch time performance.
I have tried to do a sample run of attitude pose estimation in Triton at the following URL https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/blob/main/docs/dope-triton.md During the inference, the following error occurred and Triton stopped.
I searched for the error statement and found a method to change the environment variable CUDA_LAUNCH_BLOCKING=1. I tried this environment variable and it now runs without errors. Is this the correct way to resolve this error?