AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

Incorrect positions of bounding boxes while testing my Yolov4 custom model #8352

Open Samuel-P-Moronta opened 2 years ago

Samuel-P-Moronta commented 2 years ago

I'm trying to train a Yolov4 custom model but having problems when testing the model: Incorrect bounding boxes First some information:

Dataset: 12,000 images. Dimensions of imagess: 1920x1080 Classes: 6 Training: local machine, with a GPU gtx 960 running on linux My configuration of (yolov4-custom.cfg) batch=64 subdivisions=64 width=416 height=416 filters=33 classes=6 obj.data file:

image

ob.names containes all of the 6 classes that i´m using Results: chart.png image

Testing Yolov4-custom_last.weights: I adjusted the threshold with values from 0.40 to 0.90 ​​and got same results image

Do you know what might be the problem?

stephanecharette commented 2 years ago

My first guess is your annotations are not done correctly. Can you post a few sample images and the annotations (*.txt files) that go with those images?

The other thing I find suspicious is that by iteration 1000, the map is already at nearly 100%. Did you do something like crop your training images? See: https://www.ccoderun.ca/programming/darknet_faq/#crop_training_images

Samuel-P-Moronta commented 2 years ago

Thank you so much for the quick reply! The annotations follow yolo format using (labelimg) open source software, Image: unripe_papaya_12 Image labeled image

labeled result (.txt) 3 0.416927 0.641204 0.392188 0.467593

stephanecharette commented 2 years ago

Do a search for labelimg on the darknet/yolo discord. Almost every time people use it, they have problems training. For some people, it records very incorrect coordinates. The one example you show above is correct, it looks like this when I load it up in DarkMark:

image

...but I'd really be curious to see the rest of the images and annotations to see if we can spot the problem.

The other thing I'd like to see is the command you are using to train. There are several variations. This is the one I recommend: https://www.ccoderun.ca/programming/darknet_faq/#training_command

Samuel-P-Moronta commented 2 years ago

This link contains the dataset with all images and labeled (.txt) https://drive.google.com/file/d/1GZffFFZFurr7wMUcipNQpZt-9hWpbpi4/view?usp=sharing

This repo contains all of my darknet config for training yolov4 https://github.com/Samuel-P-Moronta/Training_yolov4-SFK

The command that i used for start training is: ./darknet detector train data/obj.data cfg/yolov4-custom.cfg yolov4.conv.137 -map

I would appreciate you to check it and let me know if there is something wrong with these files.

awaisbajwaml commented 2 years ago

@Samuel-P-Moronta - apparently all your code looks good to me, and the tool you are using is perfect I used it in the past it is a great labeling tool. Can you please share your system configuration? just curious to know, CUDA / CuDNN/ GPU RAM and one question did you face a CUDA OOM error earlier in the same training?

One other comment your MAP is too good to be true - one other possibility, there could be a Darknet build issue as well. I had a similar issue, and re-building Darkent solved the problem. Give it a try.

But last and least, I would definitely go back to review the labeling once again, 1 by 1 and see if there is any hidden mess in the data set.

After all, this, if does not work please post again.

stephanecharette commented 2 years ago

Here is an example annotation from your data set:

image

This seems to be a problem with your pineapple class. Here is another screenshot, and you can see more examples in the "review" window on the right side of this image:

image

Lots of duplicates as well. E.g., see this:

image

But this does not explain the results you were getting.

My suggestion would be that you compare the training files (txt and cfg) you have now with the ones that are generated by DarkMark to see if anything stands out.

Also note that with the images you have annotated, you'll be training a network to recognize that fruit only when it appears in the middle of the image and probably also being held by two "white" hands. That may or may not be what you are trying to do, but figured I'd point it out.

Samuel-P-Moronta commented 2 years ago

@Samuel-P-Moronta :apparently all your code looks good to me, and the tool you are using is perfect I used it in the past it is a great labeling tool. Can you please share your system configuration? just curious to know, CUDA / CuDNN/ GPU RAM and one question did you face a CUDA OOM error earlier in the same training?

One other comment your MAP is too good to be true - one other possibility, there could be a Darknet build issue as well. I had a similar issue, and re-building Darkent solved the problem. Give it a try.

But last and least, I would definitely go back to review the labeling once again, 1 by 1 and see if there is any hidden mess in the data set.

After all, this, if does not work please post again. @stephanecharette GPU ram: 4 gb (nvidia-smi command) image

One issue that i had was: cuda error: out of memory that one was solved just increasing subdivision to 16 then 32 and finally (64)

Samuel-P-Moronta commented 2 years ago

@stephanecharette @awaisbajwaml

After fixing the labels related to the "overripe_pineapple" class and training it in google colab the problem was fixed. So this makes me think that the problem was related to the GPU (gtx 960) of the local training.

Differences between local training and google colab in custom cfg config: Local: batch=64 subdivisions=64 .................................. google colab: batch=64 subdivisions=16

So I recommend if someone else facing with the same problem, try to train on google colab.

Finally some results after training yolov4 in google colab. unripe_pineapple: image ripe_pineapple: image overripe_papaya image