NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.62k stars 677 forks source link

Support for dynamic input size #506

Closed meremeev closed 2 years ago

meremeev commented 3 years ago

Is there any possibility to generate TensorRT engine with dynamic input size? If not, do you have any plans to provide this functionality or ideas how to approach it?

jaybdub commented 3 years ago

Hi @meremeev ,

Thanks for reaching out!

I imagine with enough work this may be possible but I'd have to investigate what changes are necessary. I haven't personally spent much time exploring this feature because most of the embedded systems use cases target static shapes.

I'd have to dig into it a bit more to get back with a meaningful answer.

Do you mind sharing your use case for dynamic shapes? I'm curious to understand the motivation for the feature.

Best, John

meremeev commented 3 years ago

Hi John,

Luminar is LiDAR company. But in addition to hardware we provide software SDK. Part of SDK functionality is semantic segmentation model. Depend on scan pattern settings size of point cloud could be different. So only one possibility to support this flexibility is to have model which can handle point clouds with different sizes.

Aside from our use case, embedded system is a very large domain. It cover everything from simple/low cost/single function devices e.g. door bell with face recognition to very complex/multi-functional devices with a lot of computational resources e.g. self-driving autopilot. For such systems it is essential to have flexibility in format/size of input/sensor data.

Another factor is domain. In area of image recognition/object detection input data is usually fixed size image. But such areas as sequence analysis, voice recognition, motion detection, movie analysis, etc. have dimension for which size flexibility is very important. So if you see torch2trt as a universal solution to convert Torch model to TensorRT support for dynamic size is essential.

And I think something like this might work.

  1. extend conversion entry point with a way to provide additional info about dynamic dimensions for each input and min/opt/max values for this dimension to build TensorRT optimization profile. e.g. model_trt = torch2trt(model, [x], dynamic_sizes=[{0: (1, 10, 100)}])
  2. add argument to specify TensorRT builder flags. Particularly implicit batch size vs. explicit. I believe nvinfer1::IPluginV2DynamicExt interface works only with explicit batch (V2 interfaces). Right now you build implicit batch network by default.
  3. when build engine mark requested dimensions as a dynamic (-1) and provide optimization profile.

I am considering to do this changes but would like to discuss it first.

meremeev commented 3 years ago

Another API which would be very useful is to serialize TensorRT engine and save it to file. This let load it to C/C++ application later. Right now we do it in little bit hacky way.

jaybdub commented 3 years ago

Hi @meremeev ,

Thanks for your reply, you raise some interesting use cases!

Regarding dynamic shapes

I've done some more research on what might be possible, but I'm not yet able to assess the impact of this feature / if we can safely integrate it here. Currently, I understand that some converters (ie: interpolation) will require adjustment to ensure they handle dynamic shapes appropriately. Our current test cases may not reveal this, since we use the same shape for building / testing.

Another note is that TensorRT allows for multiple optimization profiles (to cover multiple input shape ranges). This adds complexity and introduces some nuanced limitations (like INT8 calibration only applies to one profile). For your use case, do most of the tensor shapes fall within a continuous range, or multiple ranges? I'm trying to assess whether there is a tangible benefit to using multiple profiles, or if it's best to just support one profile w. multiple engines (if necessary).

Also, out of curiosity, have you explored the ONNX->TensorRT workflow for your purposes? This supports dynamic shapes, but perhaps has other limitations (which I'm interested to understand if this was the case for you).

Regarding serialization for C++

Good point, I'm not sure yet if an API is needed for this, but we definitely need to at least add instructions for this to our documentation.

Is this the solution you used?

with open('model.engine', 'wb') as f:
    f.write(model_trt.engine.serialize())

Best, John

meremeev commented 3 years ago

Hi John

For my use case I need only one dynamic dimension with one range/one profile. I agree, dynamic size support is a serious rework and could be some problems to resolve.

As far as I know TensorRT do not like multiple dynamic dimensions for the same tensor. It gives performance warning.
I think idea behind multiple profiles is a way to build multiple engines from the same network. But if we convert model we always can convert it multiple times.

Our current conversion pipeline use Torch->ONNX->TensorRT path with dynamic input size. But this path has some problems I hope to avoid by using toch2trt. Torch -> ONNX does not support some operations, has types restrictions, I can not parameterize custom kernels, etc. Actually I am not sure torch2trt support that ops either because I already have them converted to custom kernels. Something to try.

But major problem comes from ONNX format compatibility. Torch 1.6 has ONNX IR version 0.0.6. It is compatible with TensorRT 7 But conversion for TensorRT 6 require IR 0.0.3 It is Torch 1.2 or 1.3. So I want to find more straight conversion pass without extra layers.

Yes, we use exactly the same code to serialize TensorRT engine.

Thanks, Mark

jaybdub commented 3 years ago

Hi @meremeev ,

It seems like supporting just one dynamic range may be sufficient (or even preferred), and just run torch2trt multiple times if needed. The only potential downside I see is memory overhead from duplicating weights, but if this proves to be an issue it could be addressed later. I may explore this feature more soon, but I still can't make any guarantees. If you happen to experiment / discover more, I'm curious to hear.

Thanks for sharing your experience with ONNX. You might find the following helpful for your purposes

  1. Since this PR torch2trt will allow you to attach converters to user-defined methods. Instructions are in the PR currently. This will allow you to apply conversion at any level you desire. You could use this to implement your custom layer with native tensorrt layers, or your own plugin layer.
  2. We have a couple of plugin examples here. These current plugins simply wrap the torch C++ calls. We haven't fully streamlined this process, but you may be able to model your plugin off of these. They use torch mechanisms for serialization, and allow for parameterization directly in Python by passing torch tensors. This approach of wrapping torch calls is relatively simple, but perhaps not optimal for memory / performance reasons (many torch calls don't allow for in-place execution, so will incur a tensor copy overhead). Also, this will pull in the torch binaries, which you may/may not find acceptable. If you've defined your own kernel, you can still develop / parameterize a plugin and use it with torch2trt without using this torch wrapping trick, it will just take more work.

I've considered streamlining this process, which I may re-explore if it proves beneficial. For now, hopefully you find the above information helpful.

Best, John

MatthieuToulemont commented 3 years ago

I don't know if it is appropriate to mention it here, but depending on the set of operations you use, you might be able to do this with TRTorch.

To be more precise, it will work if your model has a UNet like architecture for which the upsampling factor is always the same (e.g. times 2).

It is a bit more opaque than this repo but works very well for traditional CNN architectures.

Best, Matthieu

meremeev commented 3 years ago

Thank you! very interesting.

On Tue, Jul 27, 2021 at 3:13 AM MatthieuTPHR @.***> wrote:

I don't know if it is appropriate to mention it here, but depending on the set of operations you use, you might be able to do this with TRTorch https://github.com/NVIDIA/TRTorch/tree/master/docker.

To be more precise, it will work if your model has a UNet like architecture for which the upsampling factor is always the same (e.g. times 2).

It is a bit more opaque than this repo but works very well for traditional CNN architecture.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/torch2trt/issues/506#issuecomment-887389105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHC4D5BDED5Z72K5XQUQWD3TZ2BGNANCNFSM4X5BYLNA .

jihad-akl commented 1 year ago

Hi, does torch2trt now support custom dynamic input size?

meremeev commented 1 year ago

It did not at the time of this conversation. It was a while ago. Not sure about current status.

On Fri, Apr 28, 2023 at 3:50 AM jihad-akl @.***> wrote:

Hi, does torch2trt now support custom dynamic input size?

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA-AI-IOT/torch2trt/issues/506#issuecomment-1527379021, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHC4D5E23WG6JJX3JKJ2GA3XDOOI7ANCNFSM4X5BYLNA . You are receiving this because you were mentioned.Message ID: @.***>