NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
I am using a pretrained model from timm for convnextv2. It comprises of layer norm and globalresponsenormalization layer but even after adding custom quant modules for layer norm , layer norm 2d and global response norm (grn) I still can't make my model run faster than base model with fp16 engine. I am using python extension for tensorrt and using model OPT to perform the quantization.
My code for creating custom modules are as follows :
I am using a pretrained model from timm for convnextv2. It comprises of layer norm and globalresponsenormalization layer but even after adding custom quant modules for layer norm , layer norm 2d and global response norm (grn) I still can't make my model run faster than base model with fp16 engine. I am using python extension for tensorrt and using model OPT to perform the quantization.
My code for creating custom modules are as follows :
I am using the following config :
I am going by the documentation but it is not clear to me if I am doing something wrong. Help is much appreciated.