Project-MONAI / model-zoo

MONAI Model Zoo that hosts models in the MONAI Bundle format.
Apache License 2.0
203 stars 69 forks source link

VISTA-3D:About TensorRT speedup Error #703

Open xiaocaimmm opened 1 month ago

xiaocaimmm commented 1 month ago

When I use Vista3D, I encountered the following problems when running the command "python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']""

environment: TensorRT: 10.1.0 Torch-TensorRT Version: 2.4.0 Python version:3.10.15 CUDA version: 12.4 Torch Version:2.4.0+cu121 GPU:NVIDIA GeForce RTX 4090

error information: 2024-10-24 10:30:17,210 - root - INFO - Restored all variables from .//models/model.pt 2024-10-24 10:30:17,211 - ignite.engine.engine.Vista3dEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs 2024-10-24 10:30:18,220 - INFO - Loading TensorRT engine: .//models/model.pt.image_encoder.encoder.plan [I] Loading bytes from .//models/model.pt.image_encoder.encoder.plan [E] IExecutionContext::enqueueV3: Error Code 1: Cask (Cask convolution execution) 2024-10-24 10:30:19,129 - INFO - Exception: CUDA ERROR: 700 Falling back to Pytorch ... 2024-10-24 10:30:19,131 - ignite.engine.engine.Vista3dEvaluator - ERROR - Current run is terminating due to exception: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

It looks like an environmental problem, but I don't know what went wrong.

yiheng-wang-nv commented 1 week ago

Hi @xiaocaimmm , the error message shows that we cannot determine the root cause, can you set CUDA_LAUNCH_BLOCKING=1 and rerun the script? It may produce more accurate error trace.

In addition, did you modify the inference_trt.json? Like enabling head_trt_enabled. It requires more GPU memory