Closed Levi-zhan closed 1 year ago
Do you have any optimization or debugging suggestions?
Can you try the latest TRT release first? @ttyio may know more about it.
In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100
the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.
Do you have any optimization or debugging suggestions?
Can you try the latest TRT release first? @ttyio may know more about it.
In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100
the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.
Due to the limitation of my driver version, I can use cuda 11.4 at most. But I think the latest TRT release needs cuda 11.6. Maybe I made a mistake. I want to confirm to you whether the TensorRT OSS build container supports cuda 11.4
Do you have any optimization or debugging suggestions?
Can you try the latest TRT release first? @ttyio may know more about it.
In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100
the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.
I made a mistake. The input mAP of int8 engine is only 0.47. By not quantifying ConvTranspose(Use nn. ConvTranspose instead of quant_ nn.QuantConvTranspose) the accuracy is improved to 0.59. Is there any other method that can help me find other precision sensitive layers?
@Levi-zhan , here is sample code for sensitive analysis: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/quantization/speech_to_text_quant_infer.py#L71
closing since no activity for more than 3 weeks, thank you!
Description
First, I use the training data set to train a model, and the mAP of the verification set is 0.65. Then, refer to the tutorial in the Tensorrt document, and replace the conv in the model with quant_ nn.QuantConv2d。 Then, execute PTQ。and then fine tune the PTQ model in training set (50 epochs) to get the QAT model that mAP is 0.64. Then convert it to onnx, and then convert it to int8 engine. However, the accuracy in the verification set is only 0.57. Do you have any optimization or debugging suggestions?
In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100
Environment
the docker is build refer to https://github.com/NVIDIA/TensorRT/tree/8.2.1
TensorRT Version: 8.2.1 NVIDIA GPU: NVIDIA GeForce GTX 1660 SUPER NVIDIA Driver Version: 470.129.06 CUDA Version: 11.4 CUDNN Version: 8.6 Operating System: Ubuntu 20.4 Python Version (if applicable): 3.7 Tensorflow Version (if applicable): PyTorch Version (if applicable): 12.1 Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce