Open cameron388 opened 1 month ago
I have the same issue. I have yolo nas training set up on coco dataset. I trained model for a few epochs to test training set up. Then, I continued training from the checkpoint another 200 epochs. And it seems like the ckpt_best.pth
wasn't updated since. Only average and latest checkpoints were updated.
I use super-gradients==3.7.1
. According to logs, ckpt_best.pth
indeed was saved only once after the end of the first epoch of the first training.
Inspecting model checkpoints I've found the following:
best_model acc: 0.0001 epochs: 1
average_model acc: 0.3328 epochs: 200
latest_model acc: 0.3328 epochs: 200
I have the same issue. I have yolo nas training set up on coco dataset. I trained model for a few epochs to test training set up. Then, I continued training from the checkpoint another 200 epochs. And it seems like the
ckpt_best.pth
wasn't updated since. Only average and latest checkpoints were updated.
💡 Your Question
I've trained a model to 100e on my own dataset.
When testing the model using "prediction = model.predict(processed_image_path, conf=confidence_threshold, fp16=False) #prediction = model.predict(processed_image_path, conf=confidence_threshold)" Im finding reproducibly (across different models) that ckpt_best.pth performs significantly worse in terms of recall and specificity.
For example here are some values running _best.pth compared to _latest.pth
_best
Confidence Threshold: 0.50 Precision: 0.8250 Recall: 0.7500 Specificity: 0.7941
Confidence Threshold: 0.70 Precision: 0.9355 Recall: 0.6591 Specificity: 0.9412
Confidence Threshold: 0.75 Precision: 0.9259 Recall: 0.5682 Specificity: 0.9412
Confidence Threshold: 0.80 Precision: 1.0000 Recall: 0.4773 Specificity: 1.0000
Confidence Threshold: 0.85 Precision: 1.0000 Recall: 0.1818 Specificity: 1.0000
_latest
Confidence Threshold: 0.50 Precision: 0.9649 Recall: 0.9910 Specificity: 0.9529
Confidence Threshold: 0.70 Precision: 0.9808 Recall: 0.9189 Specificity: 0.9765
Confidence Threshold: 0.80 Precision: 1.0000 Recall: 0.8288 Specificity: 1.0000
Confidence Threshold: 0.85 Precision: 1.0000 Recall: 0.6937 Specificity: 1.0000
Obviously this is a very surprising result so I'm wondering if something has gone wrong?
Versions
No response