Integrate TorchTensorRt in order to increase speed during inference

Actis92 commented 2 years ago

🚀 Feature

Add a method like to_torchscript in lightgning.py that allow to convert a model in TorchTensorRT in order to increase performance

Motivation

Increase performance during inference

Proposal

    @torch.no_grad()
    def to_torch_tensorrt(
            self,
            example_inputs: Optional[Any] = None,
            enabled_precisions: Union ( torch.dtype , torch_tensorrt.dtype
            **kwargs,
    ) -> Union[ScriptModule, Dict[str, ScriptModule]]:
        mode = self.training

        # if no example inputs are provided, try to see if model has example_input_array set
        if example_inputs is None:
            if self.example_input_array is None:
                raise ValueError(
                    "Choosing method=`trace` requires either `example_inputs`"
                    " or `model.example_input_array` to be defined."
                )
            example_inputs = self.example_input_array

        # automatically send example inputs to the right device and use trace
        example_inputs = self._apply_batch_transfer_handler(example_inputs)
        trt_module = torch_tensorrt.compile(self.eval(),
                                           inputs=example_inputs,
                                           enabled_precisions=enabled_precision  # Run with FP16
                                           )
        self.train(mode)

        return trt_module

Additional context

A possible problem could be the dependencies because it depends on CUDA, cuDNN and TensorRT as you can see https://nvidia.github.io/Torch-TensorRT/v1.0.0/tutorials/installation.html and some of these dependencies I think work only on Linux

cc @borda @carmocca @awaelchli @ninginthecloud @daniellepintz @rohitgr7

luca-medeiros commented 2 years ago

Recently PyTorch team integrated Torch-TensorRT into the Pytorch ecosystem. blog post Any tips on how would one implement an export_trt to Trainer?

carmocca commented 2 years ago

We could follow the pattern used by to_onnx: https://github.com/Lightning-AI/lightning/blob/0ca3b5aa1b16667cc2d006c3833f4953b5706e72/src/pytorch_lightning/core/module.py#L1798. Comparing it to the snippet in your linked blogpost, the advantage would be to automatically use self.example_input_array (if defined) and call the batch transfer hooks to apply any transformations (if defined). This is what the top post also suggests.

davodogster commented 1 year ago

@rohitgr7 Hi is there any progress on this? I want to do super fast GPU inference with my model trained in PyTorch Lightning. How do we convert it to TRT and will it speedup inference 2x or 4x? Thanks, Sam

Borda commented 1 year ago

@davodogster would you be interested to take it over and implementing it? :rabbit:

davodogster commented 1 year ago

Hi @Borda ! Sorry, I am an applied data scientist and not a good developer so it may be a challenge for me.

Do you thinks it's easily possible for me to convert my lightning model (image segmentation, batch size >=8) to RensorRT for 3-5x speedup for inference?

davodogster commented 1 year ago

dgcnz commented 4 months ago

👍

Lightning-AI / pytorch-lightning