balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

Big loss and low accuracy #318

Closed rainley123 closed 5 years ago

rainley123 commented 5 years ago

I have a problem with my codes, I used the core code in this commit, such as net(), anchors(), bboxes_encode(), bboxes_decode() .etc, and I provide the dataset using tf.data. It can run successfully, however, the losses is between 80-100, and it is always shaking and can not converge. If anyone can tell me where is the problem, the data? net? or optimizer ?

billyceline commented 5 years ago

I've got the same problems, I also used some functions that provided by this repository. I used VOC2012 as the dataset. I found that some bounding boxes can not be encoded with the anchors, because the jaccard IOU is lower than 0.5. I change the bach_size to 1, and print out the loss every batch. Some times the loss is 0. I think it is because the bounding boxes can not compatible with the anchors. We may try to use clustering to find the most suitable anchor size like yolo.

rainley123 commented 5 years ago

In this repository, the function bboxes_encode() doesn't find the suitable anchors? And I can't understand some codes in this repository. Something different with you is that my loss doesn't get 0.

SamtFish commented 5 years ago

I have the same problem with the loss. It is very unstable and converges badly. This is how the training process looks like:

training

I have used the following settings:

dataset: VOC2012 batch_size = 32 loss_alpha = 1 negative_ratio = 3 match_threshold = 0.5 label_smoothing = 0.0 weight_decay = 0.0005

MOVING_AVERAGE_DECAY = 0.9999
LEARNING_RATE_DECAY_FACTOR = 0.94
INITIAL_LEARNING_RATE = 0.001
MOMENTUM = 0.9
SAMPLES_PER_EPOCH = 17125 EPOCHS_PER_DECAY = 2.0

billyceline commented 5 years ago

In this repository, the function bboxes_encode() doesn't find the suitable anchors? And I can't understand some codes in this repository. Something different with you is that my loss doesn't get 0.

I tried to set the batch size to 1, train 1 image at a time. Sometimes, the loss is 0. However, it doesn't matter. I tried to set the batch size to 16, and the model converge after 17000 batches.