WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.99k stars 518 forks source link

INT 8 quantization support #188

Open shekarneo opened 2 years ago

haritsahm commented 2 years ago

I managed to quantize the model using NVIDIA/pytorch-quantization. From my experiments, the accuracy drop is around 3%, gpu memory only reduced by 10% and the speed (PTQ->TensorRT-FP16-INT8) is close to the TensorRT-FP16 (no PTQ). Personally, It doesn't really help much.