Open xiaocaimmm opened 1 month ago
Hi @xiaocaimmm , the error message shows that we cannot determine the root cause, can you set CUDA_LAUNCH_BLOCKING=1 and rerun the script? It may produce more accurate error trace.
In addition, did you modify the inference_trt.json
? Like enabling head_trt_enabled
. It requires more GPU memory
When I use Vista3D, I encountered the following problems when running the command "python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']""
environment: TensorRT: 10.1.0 Torch-TensorRT Version: 2.4.0 Python version:3.10.15 CUDA version: 12.4 Torch Version:2.4.0+cu121 GPU:NVIDIA GeForce RTX 4090
error information: 2024-10-24 10:30:17,210 - root - INFO - Restored all variables from .//models/model.pt 2024-10-24 10:30:17,211 - ignite.engine.engine.Vista3dEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs 2024-10-24 10:30:18,220 - INFO - Loading TensorRT engine: .//models/model.pt.image_encoder.encoder.plan [I] Loading bytes from .//models/model.pt.image_encoder.encoder.plan [E] IExecutionContext::enqueueV3: Error Code 1: Cask (Cask convolution execution) 2024-10-24 10:30:19,129 - INFO - Exception: CUDA ERROR: 700 Falling back to Pytorch ... 2024-10-24 10:30:19,131 - ignite.engine.engine.Vista3dEvaluator - ERROR - Current run is terminating due to exception: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.It looks like an environmental problem, but I don't know what went wrong.