david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies
MIT License
640 stars 222 forks source link

Wrong labels in prediction #151

Open binbin83 opened 3 years ago

binbin83 commented 3 years ago

Hello,

First of all, thank you for this amazing repo on object detection ! :)

I am trying to use this repo for training yolo3 on logo detection dataset. The training is going well, my loss is decreasing and I am able to over-fit a small part of my dataset.

The model is good on the bounding box detection and objectness however it's always wrong on the label. (see the attached picture) result5180712509

I suspected the preprocessing or the post processing responsible for this mismatch but I have checked both and there isn't any issues. The y_true feeds to the model are consistent and when I feed the y_true to post_processing, the classes are also correct. (I attach a notebook that shows the steps I followed) exploration.zip

So , my last supposition is that there is some mismatch due to the output layers in yolo3.

is there anyone having the same sort of troubles ?

thanks for any help you will provide

Robin

david8862 commented 3 years ago

Hi @binbin83, I just checked your analysis and seems all the steps are correct. But I haven't try to train a detect model with so many object classes (>300) as yours and not sure if the classification loss could work well enough for that. So maybe you can try to check the object number for each class in your dataset and trend for each part of loss (location/confidence/class) during training. And the "--eval_online" option for training may be helpful to evaluate the performance for each class

binbin83 commented 3 years ago

Hello @david8862 ,

Thanks for your answer, I have made some tests this weekend and I think the problem is not coming from the loss : I trained darknet on 160 epoch with freeze==1 and 25 transfer learning epochs, the classification loss was around 1.5 on training and 3.5 on eval.

Then I decided set data_augmentation to False and train a new model. Because of early-stopping, the model training stopped on the 72 epoch but the class loss on the training was quite low : 0.03 (eval loss to 9.2). So, my model is able to overfit the data and indicates a good classification loss. However, when I displayed the boxes found on a the trained image dataset I stiil get the wrong labels.

It's like there is a wrong mapping in the label somewhere. However, the steps of post_preocessing and pre_processing are correct and this mapping is not constant over time : if I trained a new model, twitter <--> starbucks could become siemens <-->starbucks.

So my conclusion, this is model dependent, however, in the code I don't see any obvious link between the class file and the model.

If with this new elements you have any clue, that would be nice :)

Thanks ,

Robin

david8862 commented 3 years ago

@binbin83 sounds quite weired. The YOLO model only predict the class index and all the class name mapping relies on class name file. I've no more clue for the root cause. If you can share some data sample I can try to do some more analysis.

binbin83 commented 3 years ago

Hello @david8862 :) Thanks ! I upload on WeTransfer the dataset I used :

I have launch a training with only five classes and I will keep you posted on the results, may be you are right the high number of classes is the problem. However, why the class_loss is so low...

We'll see ! Thank you again for you help !

david8862 commented 3 years ago

Hello @david8862 :) Thanks ! I upload on WeTransfer the dataset I used :

  • the classes file : qmul_classes.txt
  • the dataset with annotation : dataset.txt
  • the folder with the 1280 images : train_image folder. The download link : https://we.tl/t-LqJOe7VMut

I have launch a training with only five classes and I will keep you posted on the results, may be you are right the high number of classes is the problem. However, why the class_loss is so low...

We'll see ! Thank you again for you help !

Thanks a lot. I tried a quick training (5 epochs) with the data samples and the checkpoint could predict some of the targets, althrough not stable enough:

starbucks starbucks_right starbucks_wrong

rittersport (not clear :() rittersport_right rittersport_wrong

So I guess the class name mapping should be ok.

I also check the training data samples, one problem is some of the samples didn't annotate all the target object. which may impact the detector performance:

gt_2_issue

gt_3_issue

binbin83 commented 3 years ago

Hello thank you so much for your answer !

The way the dataset is annotated is indeed a problem but I don't think it creates the mapping error.

The training ended ion 5 classes and the model learns properly. Here are some examples : result29328017 result31115755 result67761053 It seems, when the class_number is higher, the class_loss is not consistent with the reality.

For now I don't have any explanation ... If you have any ideas, that would be great ! :)

I continue my investigations and keep you posted if I find something !

Thanks again,

Robin