NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Quantize conformer model that is trained before #4569

Closed alexandercesarr closed 2 years ago

alexandercesarr commented 2 years ago

Hi, I trained a conformer model before. Now I wanna quantize that model and convert it to TRT. But when I run the speech_to_text_quant_infer_trt.py an error occur. The error is:

TypeError: Error instantiating 'nemo.collections.asr.modules.conformer_encoder.ConformerEncoder' : __init__() got an unexpected keyword argument 'quantize'

Could you please help me to solve it?

titu1994 commented 2 years ago

That script only supports character based models, specifically models that support quantization nodes in their code such as QuartzNet, Jasper, Citrinet. Conformer does not support this.

alexandercesarr commented 2 years ago

Thanks @titu1994 for your reply. So if I want to quantize this conformer CTC BPE model, how can I do it? What is your recommendation for this one?

titu1994 commented 2 years ago

We don't support it, so there's no recommendations as such. We ourselves have not tried it. @Slyne fyi

Slyne commented 2 years ago

Hi @alexandercesarr

First of all, if you want to deploy your models on GPUs. You may keep reading the below text.

There are two types of quantization that Tensorrt can support, explicit vs implicit. Check their difference here.

Where to get start ? Try Post Training Quantization (PTQ) first if you can export the conformer onnx model. Then check this simple example. All you need to do is to add a calibrator (to add data loader and feed the real data for the model to better calibrate).

What if the PTQ model can not meet the accuracy requirement after quantization ?

You may try QAT. Please follow pytorch_quantization on a resnet example.

@andi4191 also changed the codes and incorporate some codes to show how to do explicit quantization on NeMo conformer. Check here

alexandercesarr commented 2 years ago

Hi @Slyne Thank you very much for your great advice and help. I'll try it. But if I want to deploy my model on CPU, what should I do? It's same the above comment or not?

Slyne commented 2 years ago

Hi @Slyne Thank you very much for your great advice and help. I'll try it. But if I want to deploy my model on CPU, what should I do? It's same the above comment or not?

If it's CPU, then you can check pytorch native quantization pipeline and tutorial to get start.

alexandercesarr commented 2 years ago

Thank you @Slyne for your help.