Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 509 forks source link

How to use PTQ/QAT/INT8 for object detection? #1453

Closed Chiauwen closed 10 months ago

Chiauwen commented 1 year ago

💡 Your Question

Hi everyone, I've custom-trained my model to PTQ and QAT with the tutorial https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/qat_ptq_yolo_nas.md

After this, I got some log files, .pth file, and ptq/qat onnx file from the output as in the tutorial. At the bottom of the tutorial, it says need to convert the qat-onnx file to an INT8 TensoRT file, then I converted it with the command trtexec --fp16 --int8 --onnx=model.onnx --saveEngine=model.trt

Now I got my TensorRT file (in a .trt format).

Then here goes my questions:

  1. How to do object detection with a TensorRT file?
  2. I've already got the .pth file from the output of PTQ & QAT training, why I can't directly use the .pth file for object detection? Just like the way I'm using the .pth file for object detection from a normal non-PTQ & QAT training, by using the code below:
    
    from super_gradients.common.object_names import Models
    from super_gradients.training import models

model = models.get(Models.YOLO_NAS_M, checkpoint_path="yolonas-m/ckpt_best.pth", num_classes=1) predictions = model.predict("23.jpg") predictions.show(show_confidence=False)

If I'm using the .pth file from the output of PTQ & QAT for object detection, it will have an error message as below:

ValueError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py in call(self, model_state_dict, checkpoint_state_dict) 198 199 if ckpt_val.shape != model_val.shape: --> 200 raise ValueError(f"ckpt layer {ckpt_key} with shape {ckpt_val.shape} does not match {model_key}" f" with shape {model_val.shape} in the model") 201 new_ckpt_dict[model_key] = ckpt_val 202 return new_ckpt_dict

ValueError: ckpt layer backbone.stem.conv.post_bn.weight with shape torch.Size([48]) does not match backbone.stem.conv.branch_3x3.conv.weight with shape torch.Size([48, 3, 3, 3]) in the model


3. If there's really no way to use the PTQ & QAT INT8 TensorRT file or .pth file, can I use the ONNX file for object detection that is generated in the process of PTQ & QAT? and how?

I'm a newbie discovering YOLO-NAS.
Thanks, everyone.

### Versions

_No response_
Chiauwen commented 1 year ago

Is my questions not clear enough?😭

shaydeci commented 11 months ago

@Chiauwen Wev'e recently added in-depth tutorials on how to export and QAT YoloNAS:

Check out how to export YoloNAS + TRT (including NMS) here.

Check out how to perform QAT/PTQ on YoloNAS here.

I reckon going over the above will answer 2. and 3. Let me know if you still run into problems.

Louis-Dupont commented 10 months ago

I'm closing this issue due to inactivity. Feel free to re-open it if you still run into any problem

devvaibhav455 commented 8 months ago

I also faced the same issue. I can see the predictions using normal PTQ/ QAT checkpoint file (pth). But I guess, when we do PTQ and QAT, it overwrites the best checkpoint generated by the normal training and that's the reason we see this issue.

Below is the log excerpt from the PTQ/ QAT training part that overrides the best normal training checkpoint.

[2024-03-11 22:31:26] INFO - base_sg_logger.py - Checkpoint saved in ./sg_checkpoints_dir/yolo_nas_s/ckpt_best.pth
[2024-03-11 22:31:26] INFO - sg_trainer.py - Best checkpoint overriden: validation mAP@0.50: 0.9585086703300476

I am thinking of separating the normal and PTQ/ QAT scripts so that they don't mess up with the best checkpoint and I can resume my normal training from a checkpoint.

By the way, if I want to train for more epochs from a checkpoing, how should it be done? Resume from a normal checkpoint or resume from a PTQ/ QAT checkpoint?