Trained model gives zero detections

bonzoq commented 6 years ago

I compiled your version of darknet on Amazon Linux (using Tesla K80 GPU) with the following settings:

GPU=1
CUDNN=1
OPENCV=1
DEBUG=0
OPENMP=0
LIBSO=0

...

ARCH=  -gencode arch=compute_37,code=sm_37

I trained Yolo for 300 iterations using the steps described in https://pjreddie.com/darknet/yolo/ .

When I try to perform detection on a single image:

./darknet detector test cfg/voc.data cfg/yolo-voc.cfg backupVOC/yolo-voc_300.weights VOCdevkit/VOC2007/JPEGImages/009460.jpg

I get no detections. I think I should get some (though not correct detections) even though the training was limited.

On another occasion I trained a classifier reproducing the steps from this tutorial https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/ . Training went on for a few thousand iterations with the average lost as low as 0.05 in the end and yet still I was unable to detect anything with that model.

What might I be doing wrong ?

Chanki8658 commented 6 years ago

@AlexeyAB do i need to change the configuration as you mentioned that follow #243 (comment)
I used MSVS 2015, CUDA 8.0, cuDNN v6 (for CUDA 8.0), OpenCV 3.3.0, Windows 7 x64. On GeForce GTX 970 (4 GB RAM) Maxwell - Compute capability (CC) 5.2

i used MSVS 2017 ,cuda 9.0 ,cuDNN 7 , opencv 3.0.0 ,windows 10 x64 on Ge force GTX 1070 . is it happening because of my msvs and cuda version.

can i not continue with cuda 9 .

scamianbas commented 6 years ago

@AlexeyAB this is just to inform you that I git cloned the repository this morning and it works as your zip does. Nevertheless I noticed that the exact cfg file "data/yolo-obj.cfg" that we used is not present in the repository (content below). Maybe you should add it ? Thanks !

[net]
batch=64
subdivisions=16
height=416
width=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]
layers=-9

[reorg]
stride=2

[route]
layers=-1,-3

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=35
activation=linear

[region]
anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=2
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=0

Chanki8658 commented 6 years ago

Hi , @AlexeyAB i followed above steps as mentioned still its not detecting anything on windows .

AlexeyAB commented 6 years ago

@Chanki8658

Can you see any detection using any default models? https://github.com/AlexeyAB/darknet#pre-trained-models-for-different-cfg-files-can-be-downloaded-from-smaller---faster--lower-quality
What command line do you use for detection?
Can you show screenshot of your console output?

Chanki8658 commented 6 years ago

@AlexeyAB
no i cant even see any detection on pretrain models

for train: darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg yolo-obj_2000.weights for test: darknet.exe detector test cfg/obj.data cfg/yolo-obj.cfg yolo-obj1000.weights data/testimage.jpg

Chanki8658 commented 6 years ago

@AlexeyAB i followed this link to train the model .

https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/

AlexeyAB commented 6 years ago

@Chanki8658

Download this yolo-voc.weights-model http://pjreddie.com/media/files/yolo-voc.weights And try this command: darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights data/dog.jpg Show screenshot of your console output.

Chanki8658 commented 6 years ago

@AlexeyAB it worked for me . please the attached output .

Chanki8658 commented 6 years ago

@AlexeyAB is it problem with my training or weights i used for training

AlexeyAB commented 6 years ago

@Chanki8658 Yes, your problem in training. Now try to train model on this sign-stop-yield-dataset: https://drive.google.com/file/d/0Bw2dL4mXINuzaVllaklkMHNDYzg/view

As described here: https://github.com/AlexeyAB/darknet/issues/243#issuecomment-340512559

Train darknet.exe detector train stopsign/obj.data stopsign/yolo-obj.cfg darknet19_448.conv.23
Detect darknet.exe detector test stopsign/obj.data stopsign/yolo-obj.cfg backup/yolo-obj_400.weights stopsign/014.jpg -thresh 0.1

If you want to train your own model, use this manual (don't use any other): https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Chanki8658 commented 6 years ago

@AlexeyAB thanks a lot .. I follow that

Chanki8658 commented 6 years ago

@AlexeyAB I have trained the model on signs data but still its not predicting i am attaching my model weights and config in below link . i am not sure whenever i train model its not predicting .

https://drive.google.com/open?id=0B6yBEDUqsu7TSHpWbnI4MGdOZlk

AlexeyAB commented 6 years ago

@Chanki8658

You should base your cfg-file on yolo-voc.2.0.cfg instead of yolo-voc.cfg as described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

As it made in the darknet-master_guanghan_info_release.rar archive by path \build\darknet\x64\yolo-obj.cfg https://drive.google.com/file/d/0Bw2dL4mXINuzaVllaklkMHNDYzg/view
You should update your code from the last commit.

Chanki8658 commented 6 years ago

@AlexeyAB thanks you Alexey it worked for me now i can detect the regions .

RJVisee44 commented 6 years ago

@AlexeyAB so I trained YOLOv2 the other day with this .cfg file and everything went fine during testing:

classes= 2 train = /home//darknet/own_data/trainAB.txt valid = /home/darknet/own_data/GroupC.txt names = data/hand.names backup = GroupCValBackup

But then I tried training again, but this time using an empty validation text file (because I don't want the code to train using any GroupC data):

classes= 2 train = /home//darknet/own_data/trainAB.txt valid = /home/darknet/own_data/emptyFile.txt names = data/hand.names backup = GroupCValBackup

The training went well (trained for 12000 iterations, see attached chart). But when I test the valid by using the "map" command, I get no detections. I change the .cfg file to the original version during validation and ran this code: ./darknet detector map cfg/hand.data cfg/yolov2-hand.cfg GroupBValBackup/yolov2-hand_8000.weights

chart

Does this mean I have to include the GroupC validation data during training? I don't want the model to see any of this data until testing though. I believe everything in my .cfg file for the model is correct, as I didn't change anything from when it worked.

AlexeyAB commented 6 years ago

@RyanCodes44

Darknet doesn't see valid= during training.
For calculation map ./darknet detector map Darknet uses valid= file, so if it is empty, then mAP will be 0.

RJVisee44 commented 6 years ago

@AlexeyAB when I ran the map command I changed "valid=" to the text file with all the paths to the images I want to test on. Still 0.

AlexeyAB commented 6 years ago

@RyanCodes44

when I ran the map command I changed "valid=" to the text file with all the paths to the images I want to test on. Still 0.

Can you show screenshot?
What mAP can you get when valid= /home//darknet/own_data/trainAB.txt ?

RJVisee44 commented 5 years ago

@AlexeyAB

Can you show screenshot? Attached. Get about 5.52%. Need more training? The last one I trained only required 9000 iterations and worked really well. However, in that one, I also didn't change the learning rate at 90/95% of 12000 iterations. And based on my average loss chart above, it looks like it should work decently well.

failed

What mAP can you get when valid= /home//darknet/own_data/trainAB.txt ? Will attach soon. However, even when I run the detector (via ./darknet detector test cfg/hand.data cfg/yolov2-hand.cfg GroupCValBackup/yolov2-hand_9000.weights image1.jpg) where image1 is a train image, it is unable to detect any hands.

AlexeyAB commented 5 years ago

@RyanCodes44

. I change the .cfg file to the original version during validation and ran this code: ./darknet detector map cfg/hand.data cfg/yolov2-hand.cfg GroupBValBackup/yolov2-hand_8000.weights

What did you change?

So did you train from the begining 1st time and Detection works well?
Then you train from the begining 2nd time (just change valid=emptyFile.txt), and now Detection doesn't work, isn't it?

I think you changed something else in the second case. In the 2nd case you did something wrong.

Also if you train to distinguish Left-something and Right-something, you should set flip=0 in the [net]-section in your cfg-file before training.

RJVisee44 commented 5 years ago

@AlexeyAB

What did you change?

Sorry if this wasn't clear. But so far I have trained 2 models. Model 1: trained on GroupAB (tested on GroupC after training), Model 2: trained on GroupAC (tested on GroupB after training). Training and testing on Model 1 went really well. For Model 2 during training, instead of including the validation file in the "hand.data", I placed an empty .txt file to ensure none of the GroupB data was being used during training. Training went well, as viewed above. Then, when I ran detector map, I placed changed "hand.data" by placing the correct .txt file for GroupB for valid =.

So did you train from the begining 1st time and Detection works well?

Yes I believe I trained from beginning (not entirely sure what you mean). I trained via: ./darknet detector train cfg/hand.data cfg/yolov2-hand.cfg darknet19_448.conv.23

Then you train from the begining 2nd time (just change valid=emptyFile.txt), and now Detection doesn't work, isn't it?

No, I think I wasn't clear on my process but hopefully the above clears this up. The second model is an exact replica of the first model, except it is trained on different groups. The first model gives me over 90% mAP.

So Model 1 "hand.data":

classes= 2 train = /home/darknet/own_data/trainAB.txt valid = /home/darknet/own_data/GroupC.txt names = data/hand.names backup = GroupCValBackup

Model 2 "hand.data":

classes= 2 train = /home/darknet/own_data/trainAC.txt valid = /home/darknet/own_data/GroupB.txt names = data/hand.names backup = GroupCValBackup

However, during training in Model 2, the bolded text was just an emptyFile.txt to ensure GroupB wasn't used during training. After training completion, I changed Model 2 "hand.data" as seen above, ran detector map and got very poor results.

Also if you train to distinguish Left-something and Right-something, you should set flip=0 in the [net]-section in your cfg-file before training.

Why do I need to do this? Thanks!

RJVisee44 commented 5 years ago

@AlexeyAB

What mAP can you get when valid= /home//darknet/own_data/trainAB.txt ?

./darknet detector map cfg/hand.data cfg/yolov2-hand.cfg GroupBValBackup2/yolov2-hand_10000.weights

failedmore

Could this be something wrong with the weights file? I'm gonna try to rebuild darknet.

RJVisee44 commented 5 years ago

@AlexeyAB seems to be a problem with the number of iterations trained. Unable to get decent results until at least 12,500 iterations now for some reason.

nagarajdesai commented 5 years ago

@AlexeyAB I am facing the same issue when training with COCO data set for only 9 classes. When I run "detector test" command, I do not see any predictions in the predictions.jpg. Please help me.

AlexeyAB / darknet

Trained model gives zero detections #243