Detection works very well but confidence is very low

I've attached the inference results using the simrdwn. As you can see, the detection is quite good, but I'm not sure why (for most of the buildings at least) the confidence is so low. Strangely this is true for a training image as well which doesn't make sense. Do you have any ideas as to what could be a possible reason for this or is it just due to the standard problems like insufficient iterations or insufficient data?

atlanta_nadir7_catid_1030010003d22f00_732701_3731889_thresh 0 05 atlanta_nadir16_catid_1030010002649200_748451_3735939_thresh 0 05 atlanta_nadir27_catid_1030010003472200_734501_3743589_thresh 0 05

Glad you got it working. The confidences have always been relatively low for the YOLT models, though increasing training time will increase confidence levels.

I see. Initially I trained the model for 50,000 iterations. But even after doubling the training time (100,000 iterations) the model confidence is still quite low.

atlanta_nadir14_catid_10300100039ab000_736301_3730089_thresh 0 05 atlanta_nadir10_catid_1030010003caf100_742151_3722439_thresh 0 05

On a side note, I tried one of the tensorflow models as well to see how well it does (framework:SSD, num of batches:30,000, batch_size:16). The results are clearly not as good as the YOLT models.

atlanta_nadir10_catid_1030010003caf100_732701_3725139_thresh 0 05

While this could be due to not training for enough iterations, could it be due to the usage of the ssd config under the _orig config folder? I'm not sure what the _orig and _altered_v0 configs for the tensorflow models represent but am I correct in assuming that the altered configs are those optimised for satellite images and the originals are the original configs as specified in their respective papers/implementations?

For example, in the case of SSD v2 config the only difference between the altered and the original config is with respect to learning rate and decay steps. I'm currently trying out the altered SSD config to verify if this improves performance.

Update: Using the SSD config from the altered_v0 folder with longer number of iterations (60,000) doesn't offer a significant improvement.

I trained the YOLT model even longer and so far the model confidence hasn't improved. From it's results it's obviously a very strong detector but I'm unable to quantify how good the model is.

YOLT detection

atlanta_nadir27_catid_1030010003472200_734501_3740889_thresh 0 05

In most cases, the YOLT model does correct and incorrect detection with almost similar confidence scores. That is, it would correctly detect a building with 5% confidence and incorrectly detect a building with 6% confidence which makes it, as I mentioned before, hard to determine how good it actually is. I suppose we could just not use confidence score as a metric and have the model output all detections and see which are correct but I'm not sure that's the right approach. Would like your input on the matter.

Regarding the other tensorflow models, as mentioned in the above post, it's not as good as the yolt model (at least in the case of SSD). I haven't been able to try out the faster rcnn variants since it always takes up all of my memory.

SSD Detection

Plot threshold strength = 0.3 atlanta_nadir27_catid_1030010003472200_734501_3740889_thresh 0 3

Plot threshold strength = 0.2 atlanta_nadir27_catid_1030010003472200_734501_3740889_thresh 0 2

In summary,

Should confidence not be considered for YOLT models?
Are there some recommended strategies to train the tensorflow models such that they perform better? Perhaps choice of parameters?

Thanks

I'm not sure what's causing the confidences to be so low. You could try retraining with the recently updated v2 and see if the problem persists.

I tried out the latest version of YOLT and the problem still persists as you can see from the images below. Note that I've only tried training the new model for 30000 epochs.

Atlanta_nadir10_catid_1030010003CAF100_732701_3730989_thresh=0 05

I'll update this post after trying out a couple more experiments i.e. longer training time, tf models etc.

Edit: Tried with 60000 epochs. No significant difference

I'm having a similar issue training on my custom data from the darknet source code. I've bounced back & forth between it being anchor values, batch size, stronger GPU, different baseline weights, Yolo vs Yolo-tiny... any ideas or solutions you had here?

avanetten / simrdwn

Detection works very well but confidence is very low #24