NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.59k stars 675 forks source link

max_batch_size doesn't work as expected #740

Open sedatester opened 2 years ago

sedatester commented 2 years ago

As per the documentation, I should be able to provide param max_batch_size during model conversion and be able to provide different batch sizes during inference. This doesn't seem to be happening. Code to reproduce:


import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet

# create some regular pytorch model...
model = alexnet(pretrained=True).eval().cuda()

# create example data
x = torch.ones((1, 3, 224, 224)).cuda()

# convert to TensorRT feeding sample data as input
model_trt = torch2trt(model, [x], max_batch_size=4)

nx = torch.ones((3, 3, 224, 224)).cuda()
y = model(nx)
y_trt = model_trt(nx)
print (f'y shape: {y.shape}, y_trt shape: {y_trt.shape}')

# check the output against PyTorch
print(torch.max(torch.abs(y - y_trt)))

Output:

[05/24/2022-02:31:31] [TRT] [E] 3: [executionContext.cpp::setBindingDimensions::944] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::944, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [3,3,224,224] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 1, minimum dimension in profile is 1, but supplied dimension is 3.
)
y shape: torch.Size([3, 1000]), y_trt shape: torch.Size([1, 1000])
tensor(1.9073e-06, device='cuda:0', grad_fn=<MaxBackward1>)

As we can see the converted TRT model's output batch size is 1 instead of 3. Does max_batch_size not work?

mcmingchang commented 2 years ago

I have the same problem

mcmingchang commented 2 years ago

import torch from torch2trt import torch2trt from torchvision.models.alexnet import alexnet

create some regular pytorch model...

model = alexnet(pretrained=False).eval().cuda()

x = torch.ones((4, 3, 224, 224)).cuda() model_trt = torch2trt(model, [x], max_batch_size=4, )

nx = torch.ones((1, 3, 224, 224)).cuda() y = model(nx) y_trt = model_trt(nx) print (f'y shape: {y.shape}, y_trt shape: {y_trt.shape}')

y shape: torch.Size([1, 1000]), y_trt shape: torch.Size([4, 1000])

chaoz-dev commented 2 years ago

Let me take a look at this. This is probably from our usage of the explicit batch dimension, which might require the different optimization profiles after all, in order to allow batch dimensions from 1 to the given value.

chaoz-dev commented 2 years ago

Hmm after investigating this a bit, this might be trickier than I thought. Specifically, dynamic shapes in TRT don't work as how I had envisioned it:

Specify each runtime dimension of an input tensor by using -1 as a placeholder for the dimension.

So while it's possible to create an optimization profile with a range of dimensions, the input tensor needs to use -1 as a placeholder for a given dimension, whereas we want to allow different statically shaped tensors to run.

chaoz-dev commented 2 years ago

Alright, I think I've got something working with respect to dynamic shapes. Turns out it was actually easier than I had originally thought, although it took some experimentation as I had misunderstood the TRT documentation as it was written.

I'll do some clean up and post a solution, but it seems like we could make the entire tensor dynamic, if we chose to do so :P
I'm not sure what the performance implications are of making this change though... in theory, converting to dynamic tensor shapes shouldn't be less performant, but we should profile to make sure...

chaoz-dev commented 2 years ago

Addressed in #743

chaoz-dev commented 2 years ago

I'm getting:

[06/07/2022-17:14:33] [TRT] [E] 7: [shapeMachine.cpp::execute::688] Error Code 7: Internal Error (IShuffleLayer:0:SHUFFLE:GPU: reshaping failed for tensor: (Unnamed Layer* 13) [Pooling]_outputreshape would change volume 
Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 256 6 6} {4 9216}
)

when running the above examples, but I think the core issue regarding the batch sizes should be addressed with the aforementioned fix. Not sure if this issue is related...

mkulariya commented 2 years ago

facing the same issue, supplying max_batch_size=64 at the time of model conversion from pytorch to trt using batch_size=4 at the time of inference it used to work fine some time ago but not working now. 0:00:19.731892276 149 0x16f8f270 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1642> [UID = 1]: Backend has maxBatchSize 1 whereas 4 has been requested 0:00:19.731917286 149 0x16f8f270 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1813> [UID = 1]: deserialized backend context :/data/gcs/dt-model-store/62ad37ae6ad4cf7f61ec46a1/tensorrt.engine failed to match config params, trying rebuild 0:00:19.735589935 149 0x16f8f270 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1715> [UID = 1]: Trying to create engine from model files ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:934 failed to build network since there is no model file matched. ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:872 failed to build network. 0:00:19.735951176 149 0x16f8f270 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1735> [UID = 1]: build engine file failed 0:00:19.735983020 149 0x16f8f270 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1821> [UID = 1]: build backend context failed 0:00:19.736005286 149 0x16f8f270 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1148> [UID = 1]: generate backend failed, check config file settings 0:00:19.736436390 149 0x16f8f270 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start:<primary-inference> error: Failed to create NvDsInferContext instance 0:00:19.736463410 149 0x16f8f270 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start:<primary-inference> error: Config file path: /tmp/nvinfer-config.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

chaoz-dev commented 2 years ago

@mkulariya Just checking, but when you're performing the model conversion you are also supplying tensors of batch size 4 correct? And the above patch did not address your issue?

mkulariya commented 2 years ago

@chaoz-dev 1. no at the time of model conversion, I am supplying tensor of batch size 1. trt_model = torch2trt(model, [dummy_input], max_batch_size=64) here shapes of the dummy input is 1, 3, 224, 224

  1. I have not tried the patch, since it used to work like this earlier, I am using an older version of torch2trt for model conversion.
chaoz-dev commented 2 years ago

@mkulariya If you're using an older version of torch2trt then this might be a different issue than the one posted here. What version of torch2trt are you using? This issue is related to the recent change to explicit batch tensors, which removed the ability to implicitly use different input batch sizes than what was used for compilation.

mkulariya commented 2 years ago

@chaoz-dev I meant to say, using the old version temporary for conversion as the current version is not working. are you suggesting that the size of input should be the same as max_batch_size during compilation to work it properly? for ex. dummy_input = [64, 3, 128, 128] trt = torch2trt(model, dummy_input, max_batch_size=64)

wanduoz commented 2 years ago

@chaoz-dev Hi, I install the latest torch2trt package and successfully run the sample code you posted at #743 . However, when I apply dynamic bs at my model, it return sample error as you mentioned.

[06/07/2022-17:14:33] [TRT] [E] 7: [shapeMachine.cpp::execute::688] Error Code 7: Internal Error (IShuffleLayer:0:SHUFFLE:GPU: reshaping failed for tensor: (Unnamed Layer* 13) [Pooling]_outputreshape would change volume 
Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 256 6 6} {4 9216}
)
sedatester commented 2 years ago

Thanks @chaoz-dev for looking into this. I also looked at @jaybdub's https://github.com/NVIDIA-AI-IOT/torch2trt/pull/764/ that's supposed to solve this. I just got back to this and I'm also seeing the same issues as others mention. For my original code:

import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet

# create some regular pytorch model...
model = alexnet(pretrained=True).eval().cuda()

# create example data
x = torch.ones((1, 3, 224, 224)).cuda()

# convert to TensorRT feeding sample data as input
model_trt = torch2trt(model, [x], max_batch_size=4)

nx = torch.ones((3, 3, 224, 224)).cuda()
y = model(nx)
y_trt = model_trt(nx)
print (f'y shape: {y.shape}, y_trt shape: {y_trt.shape}')

# check the output against PyTorch
print(torch.max(torch.abs(y - y_trt)))

The output with the latest torch2trt is:

[07/28/2022-20:24:34] [TRT] [E] 7: [shapeMachine.cpp::execute::688] Error Code 7: Internal Error (IShuffleLayer :0:SHUFFLE:GPU: reshaping failed for tensor: (Unnamed Layer* 13) [Pooling]_output
reshape would change volume
Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{3 256 6 6} {1 9216}
)
Traceback (most recent call last):
  File "test_torch2trt.py", line 17, in <module>
    y_trt = model_trt(nx)
  File "/denali/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/denali/.venv/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/torch2trt.py", line 593, in forward
    shape = tuple(self.context.get_binding_shape(idx))
ValueError: __len__() should return >= 0

I think this error is still related to dynamic batch size handling as if you take a look at the shapes mentioned in the error {3 256 6 6} {1 9216}, 256 x 6 x 6 = 9216 and thus it looks like somehow the batch dimension is getting ignored. Would be great if we could solve this issue.

I also tried @jaybdub's new API to modify the code to:

import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet

# create some regular pytorch model...
model = alexnet(pretrained=True).eval().cuda()

# create example data
x = torch.ones((2, 3, 224, 224)).cuda()

# convert to TensorRT feeding sample data as input
# model_trt = torch2trt(model, [x], max_batch_size=4)
model_trt = torch2trt(model, [x], min_shapes=[(1, 3, 224, 224)], max_shapes=[(4, 3, 224, 224)], opt_shapes=[(2, 3, 224, 224)])

nx = torch.ones((3, 3, 224, 224)).cuda()
y = model(nx)
y_trt = model_trt(nx)
print (f'y shape: {y.shape}, y_trt shape: {y_trt.shape}')

# check the output against PyTorch
print(torch.max(torch.abs(y - y_trt)))

This produces a slightly modified form of the same error:

[07/28/2022-20:31:57] [TRT] [E] 4: [shapeCompiler.cpp::evaluateShapeChecks::924] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer :0:SHUFFLE:GPU: reshaping failed for tensor: (Unnamed Layer* 13) [Pooling]_output)
Traceback (most recent call last):
  File "test_torch2trt.py", line 17, in <module>
    y_trt = model_trt(nx)
  File "/denali/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/denali/.venv/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/torch2trt.py", line 583, in forward
    idx = self.engine.get_binding_index(input_name)
AttributeError: 'NoneType' object has no attribute 'get_binding_index'
jaybdub commented 2 years ago

I believe the issue is related to the flatten converter.

https://github.com/NVIDIA-AI-IOT/torch2trt/blob/540520700f969e13b921be1bb944c44d299ff406/torch2trt/converters/view.py#L13

https://github.com/pytorch/vision/blob/main/torchvision/models/alexnet.py#L50

The layer is using a generic converter to fix the output shape to that produce by PyTorch, however at runtime these change, which results in an incorrect output shape.

This is a known limitation for a few converters that I'm working to resolve. Some may take longer, but I've just added a flatten converter that should work in this PR

https://github.com/NVIDIA-AI-IOT/torch2trt/pull/778/files

Could you try this out and let me know if it works for you?

Best, John

jaybdub commented 2 years ago

Please try this PR instead.

https://github.com/NVIDIA-AI-IOT/torch2trt/pull/779

This has general dynamic shape fixes for a larger variety of converters.

eav-solution commented 2 years ago

File "tools/trt.py", line 95, in main() └ <function main at 0x7f1ab28f28>

File "/home/dmp/.local/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) │ │ └ {} │ └ () └ <function main at 0x7f210af268>

File "tools/trt.py", line 80, in main max_batch_size=args.batch, │ └ 4 └ Namespace(batch=4, ckpt='weights/Eduardo_Fire/best_ckpt.pth', exp_file='exps/example/yolox_voc/yolox_voc_nano.py', experiment...

File "/home/dmp/.local/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/torch2trt.py", line 757, in torch2trt outputs = module(inputs) │ └ [tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.]... └ YOLOX( (backbone): YOLOPAFPN( (backbone): CSPDarknet( (stem): Focus( (conv): BaseConv( (conv): ... File "/home/dmp/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl result = forward_call(input, **kwargs) │ │ └ {} │ └ (tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.]... └ <bound method YOLOX.forward of YOLOX( (backbone): YOLOPAFPN( (backbone): CSPDarknet( (stem): Focus( (conv...

File "/home/dmp/1.Users/1.TinhLam/15.Convert_Fire_Pt_To_Trt/YOLOX/yolox/models/yolox.py", line 30, in forward fpn_outs = self.backbone(x) │ └ tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.],... └ YOLOX( (backbone): YOLOPAFPN( (backbone): CSPDarknet( (stem): Focus( (conv): BaseConv( (conv): ...

File "/home/dmp/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl result = forward_call(*input, **kwargs) │ │ └ {} │ └ (tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.]... └ <bound method YOLOPAFPN.forward of YOLOPAFPN( (backbone): CSPDarknet( (stem): Focus( (conv): BaseConv( (c...

File "/home/dmp/1.Users/1.TinhLam/15.Convert_Fire_Pt_To_Trt/YOLOX/yolox/models/yolo_pafpn.py", line 93, in forward out_features = self.backbone(input) │ └ tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.],... └ YOLOPAFPN( (backbone): CSPDarknet( (stem): Focus( (conv): BaseConv( (conv): Conv2d(12, 16, kernel_size=(3...

File "/home/dmp/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl result = forward_call(*input, **kwargs) │ │ └ {} │ └ (tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.]... └ <bound method CSPDarknet.forward of CSPDarknet( (stem): Focus( (conv): BaseConv( (conv): Conv2d(12, 16, kernel_si...

File "/home/dmp/1.Users/1.TinhLam/15.Convert_Fire_Pt_To_Trt/YOLOX/yolox/models/darknet.py", line 169, in forward x = self.stem(x) │ └ tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.],... └ CSPDarknet( (stem): Focus( (conv): BaseConv( (conv): Conv2d(12, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1...

File "/home/dmp/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl result = forward_call(*input, **kwargs) │ │ └ {} │ └ (tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.], │ [1., 1., 1., ..., 1., 1., 1.]... └ <bound method Focus.forward of Focus( (conv): BaseConv( (conv): Conv2d(12, 16, kernel_size=(3, 3), stride=(1, 1), paddi...

File "/home/dmp/1.Users/1.TinhLam/15.Convert_Fire_Pt_To_Trt/YOLOX/yolox/models/network_blocks.py", line 208, in forward dim=1,

File "/home/dmp/.local/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/torch2trt.py", line 306, in wrapper converter"converter" │ └ <torch2trt.torch2trt.ConversionContext object at 0x7f1a921748> └ {'converter': <function convert_cat at 0x7f3e51d620>, 'is_real': True, 'module': <module 'torch' from '/home/dmp/.local/lib/p... File "/home/dmp/.local/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/converters/cat.py", line 16, in convert_cat trt_inputs = broadcast_trt_tensors(ctx.network, trt_inputs, len(output.shape)) │ │ │ │ │ └ <attribute 'shape' of 'torch._C._TensorBase' objects> │ │ │ │ └ tensor([[[[1., 1., 1., ..., 1., 1., 1.], │ │ │ │ [1., 1., 1., ..., 1., 1., 1.], │ │ │ │ [1., 1., 1., ..., 1., 1., 1.],... │ │ │ └ [<tensorrt.tensorrt.ITensor object at 0x7f1252cce0>, <tensorrt.tensorrt.ITensor object at 0x7f1252cdf8>, <tensorrt.tensorrt.I... │ │ └ <torch2trt.torch2trt.NetworkWrapper object at 0x7f1a921828> │ └ <torch2trt.torch2trt.ConversionContext object at 0x7f1a921748> └ <function broadcast_trt_tensors at 0x7f268e7840> File "/home/dmp/.local/lib/python3.6/site-packages/torch2trt-0.4.0-py3.6.egg/torch2trt/torch2trt.py", line 191, in broadcast_trt_tensors if len(t.shape) < broadcast_ndim: │ │ └ 4 │ └ <property object at 0x7f80d3ba98> └ <tensorrt.tensorrt.ITensor object at 0x7f1252cce0>

ValueError: len() should return >= 0

I just check your source code and work fine (with sample code). But I don't know why I can't convert my model correctly.

jaybdub commented 2 years ago

Hi @eav-solution ,

Thanks for reaching out.

Do you mind sharing which model you're attempting to convert, and the code that you used to instantiate / convert the model?

Best, John

eav-solution commented 2 years ago

I used yolox-nano with this code to convert. https://github.com/Megvii-BaseDetection/YOLOX/blob/0.3.0/tools/trt.py

eav-solution commented 2 years ago

Here is original weights I try to convert https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano.pth

eav-solution commented 2 years ago

I found the problem. I don't recognize input shape from here. Max batch size param will not work if incorrect input shape. https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/trt.py#L61

jaybdub commented 2 years ago

I was able to reproduce your issue.

I think the issue is related to the getitem converter.

https://github.com/NVIDIA-AI-IOT/torch2trt/blob/540520700f969e13b921be1bb944c44d299ff406/torch2trt/converters/getitem.py#L93

I haven't yet updated the getitem converter to handle dynamic shapes.

I think this should be relatively straightforward to do in the new PR https://github.com/NVIDIA-AI-IOT/torch2trt/pull/779 but may take some time.

jaybdub commented 2 years ago

Hi @eav-solution ,

I've fixed the getitem converter for dynamic shapes in this PR https://github.com/NVIDIA-AI-IOT/torch2trt/pull/779.

Please pull it and give it a try, I've tested that YOLOX build now passes with batch size 4 using the following command.

python3 -m tools.trt -e exp/default/yolox_nano.py -c yolox_nano.pth -b 4

I haven't verified the accuracy though, so please let me know if it is working for you.

Best, John