DerryHub / BEVFormer_tensorrt

BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Apache License 2.0
433 stars 71 forks source link

RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 11.77 GiB total capacity; 8.90 GiB already allocated; 509.19 MiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF #5

Closed yhwang-hub closed 1 year ago

yhwang-hub commented 1 year ago

#################### Running RotateTestCase #################### test_fp16_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase) ... /home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py:15: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than tensor.new_tensor(sourceTensor). center[0] -= center[0].new_tensor(ow 0.5) /home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py:16: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than tensor.new_tensor(sourceTensor). center[1] -= center[1].new_tensor(oh 0.5) Warning: Unsupported operator RotateTRT. No schema registered for this operator. Warning: Unsupported operator RotateTRT. No schema registered for this operator. Warning: Unsupported operator RotateTRT. No schema registered for this operator. FAIL test_fp16_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase) ... Warning: Unsupported operator RotateTRT. No schema registered for this operator. Warning: Unsupported operator RotateTRT. No schema registered for this operator. Warning: Unsupported operator RotateTRT. No schema registered for this operator. ERROR test_fp32_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase) ... ERROR test_fp32_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase) ... ERROR

#################### Running RotateTestCase2 #################### test_fp16_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2) ... ERROR test_fp16_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2) ... ERROR test_fp32_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2) ... ERROR test_fp32_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2) ... ERROR

====================================================================== ERROR: test_fp16_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 57, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 87, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 85, in pth2trt engine = build_engine(f, fp16=fp16) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 67, in build_engine engine = runtime.deserialize_cuda_engine(plan) TypeError: deserialize_cuda_engine(): incompatible function arguments. The following argument types are supported:

  1. (self: tensorrt.tensorrt.Runtime, serialized_engine: buffer) -> tensorrt.tensorrt.ICudaEngine

Invoked with: <tensorrt.tensorrt.Runtime object at 0x7f527eff8df0>, None

====================================================================== ERROR: test_fp32_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 57, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 95, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 75, in pth2trt torch.onnx.export( File "/opt/conda/lib/python3.8/site-packages/torch/onnx/init.py", line 305, in export return utils.export(model, args, f, export_params, verbose, training, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 719, in _export _model_to_graph(model, args, verbose, input_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 499, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 440, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 391, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward result = self.forward(*input, kwargs) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 32, in forward output = self.module(*inputs, **self.kwargs) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py", line 116, in rotate return _rotate(img, angle, center, _MODE[interpolation]) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py", line 66, in forward img = torch.grid_sampler(img, grid, interpolation, 0, False) RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 11.77 GiB total capacity; 8.02 GiB already allocated; 1.38 GiB free; 8.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== ERROR: test_fp32_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 57, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 95, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 75, in pth2trt torch.onnx.export( File "/opt/conda/lib/python3.8/site-packages/torch/onnx/init.py", line 305, in export return utils.export(model, args, f, export_params, verbose, training, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 719, in _export _model_to_graph(model, args, verbose, input_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 499, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 440, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 391, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward result = self.forward(*input, kwargs) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 32, in forward output = self.module(*inputs, **self.kwargs) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py", line 116, in rotate return _rotate(img, angle, center, _MODE[interpolation]) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/functions/rotate.py", line 66, in forward img = torch.grid_sampler(img, grid, interpolation, 0, False) RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 11.77 GiB total capacity; 8.02 GiB already allocated; 1.38 GiB free; 8.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== ERROR: test_fp16_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 142, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 87, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 75, in pth2trt torch.onnx.export( File "/opt/conda/lib/python3.8/site-packages/torch/onnx/init.py", line 305, in export return utils.export(model, args, f, export_params, verbose, training, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 719, in _export _model_to_graph(model, args, verbose, input_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 499, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 440, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 391, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in wrapper tuple(x.clone(memory_format=torch.preserve_format) for x in args) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in tuple(x.clone(memory_format=torch.preserve_format) for x in args) RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.77 GiB total capacity; 8.90 GiB already allocated; 509.19 MiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== ERROR: test_fp16_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 129, in setUp BaseTestCase.init( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 33, in init self.createInputs() File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 64, in createInputs self.inputs_pth_fp16 = {key: val.half() for key, val in inputs_pth.items()} File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 64, in self.inputs_pth_fp16 = {key: val.half() for key, val in inputs_pth.items()} RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.77 GiB total capacity; 8.90 GiB already allocated; 509.19 MiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== ERROR: test_fp32_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 142, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 95, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 75, in pth2trt torch.onnx.export( File "/opt/conda/lib/python3.8/site-packages/torch/onnx/init.py", line 305, in export return utils.export(model, args, f, export_params, verbose, training, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 719, in _export _model_to_graph(model, args, verbose, input_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 499, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 440, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 391, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in wrapper tuple(x.clone(memory_format=torch.preserve_format) for x in args) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in tuple(x.clone(memory_format=torch.preserve_format) for x in args) RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 11.77 GiB total capacity; 8.90 GiB already allocated; 509.19 MiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== ERROR: test_fp32_nearest (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase2)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 142, in setUp self.buildEngine(opset_version=13) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/base_test_case.py", line 95, in buildEngine engine = pth2trt( File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/utils.py", line 75, in pth2trt torch.onnx.export( File "/opt/conda/lib/python3.8/site-packages/torch/onnx/init.py", line 305, in export return utils.export(model, args, f, export_params, verbose, training, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 118, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 719, in _export _model_to_graph(model, args, verbose, input_names, File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 499, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 440, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 391, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in wrapper tuple(x.clone(memory_format=torch.preserve_format) for x in args) File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 114, in tuple(x.clone(memory_format=torch.preserve_format) for x in args) RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 11.77 GiB total capacity; 8.90 GiB already allocated; 509.19 MiB free; 8.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

====================================================================== FAIL: test_fp32 (det2trt.models.utils.test_trt_ops.test_modulated_deformable_conv2d.ModulatedDeformableConv2dTestCase)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_modulated_deformable_conv2d.py", line 83, in test_fp32 self.fp32_case() File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_modulated_deformable_conv2d.py", line 72, in fp32_case self.assertLessEqual(cost, delta) AssertionError: 0.0036153677 not less than or equal to 1e-05

====================================================================== FAIL: test_fp32 (det2trt.models.utils.test_trt_ops.test_modulated_deformable_conv2d.ModulatedDeformableConv2dTestCase2)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_modulated_deformable_conv2d.py", line 157, in test_fp32 self.fp32_case() File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_modulated_deformable_conv2d.py", line 146, in fp32_case self.assertLessEqual(cost, delta) AssertionError: 0.0036153677 not less than or equal to 1e-05

====================================================================== FAIL: test_fp16_bilinear (det2trt.models.utils.test_trt_ops.test_rotate.RotateTestCase)

Traceback (most recent call last): File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 91, in test_fp16_bilinear self.fp16_case(0.01) File "/home/wyh/BEVFormer_tensorrt/./det2trt/models/utils/test_trt_ops/test_rotate.py", line 82, in fp16_case self.assertLessEqual(cost, delta) AssertionError: 0.258 not less than or equal to 0.01


Ran 136 tests in 381.644s

FAILED (failures=3, errors=7)

Running RotateTestCase时报错显存不够

DerryHub commented 1 year ago

You can test these operators individually if you don't have enough GPU memory.