Closed Chiauwen closed 10 months ago
Is my questions not clear enough?ðŸ˜
I'm closing this issue due to inactivity. Feel free to re-open it if you still run into any problem
I also faced the same issue. I can see the predictions using normal PTQ/ QAT checkpoint file (pth). But I guess, when we do PTQ and QAT, it overwrites the best checkpoint generated by the normal training and that's the reason we see this issue.
Below is the log excerpt from the PTQ/ QAT training part that overrides the best normal training checkpoint.
[2024-03-11 22:31:26] INFO - base_sg_logger.py - Checkpoint saved in ./sg_checkpoints_dir/yolo_nas_s/ckpt_best.pth
[2024-03-11 22:31:26] INFO - sg_trainer.py - Best checkpoint overriden: validation mAP@0.50: 0.9585086703300476
I am thinking of separating the normal and PTQ/ QAT scripts so that they don't mess up with the best checkpoint and I can resume my normal training from a checkpoint.
By the way, if I want to train for more epochs from a checkpoing, how should it be done? Resume from a normal checkpoint or resume from a PTQ/ QAT checkpoint?
💡 Your Question
Hi everyone, I've custom-trained my model to PTQ and QAT with the tutorial https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/qat_ptq_yolo_nas.md
After this, I got some log files, .pth file, and ptq/qat onnx file from the output as in the tutorial. At the bottom of the tutorial, it says need to convert the qat-onnx file to an INT8 TensoRT file, then I converted it with the command
trtexec --fp16 --int8 --onnx=model.onnx --saveEngine=model.trt
Now I got my TensorRT file (in a .trt format).
Then here goes my questions:
model = models.get(Models.YOLO_NAS_M, checkpoint_path="yolonas-m/ckpt_best.pth", num_classes=1) predictions = model.predict("23.jpg") predictions.show(show_confidence=False)
ValueError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/checkpoint_utils.py in call(self, model_state_dict, checkpoint_state_dict) 198 199 if ckpt_val.shape != model_val.shape: --> 200 raise ValueError(f"ckpt layer {ckpt_key} with shape {ckpt_val.shape} does not match {model_key}" f" with shape {model_val.shape} in the model") 201 new_ckpt_dict[model_key] = ckpt_val 202 return new_ckpt_dict
ValueError: ckpt layer backbone.stem.conv.post_bn.weight with shape torch.Size([48]) does not match backbone.stem.conv.branch_3x3.conv.weight with shape torch.Size([48, 3, 3, 3]) in the model