Closed gurmeetsidhu closed 7 years ago
Thanks for the prompt reply,
darknet.exe detector train data/obj.data cfgs/yolo-obj.cfg weights/darknet19_448.conv.23
^^ Just reorged my cfgs and weights b/c it was getting cluttered. And finally my edits to the CFG
``
[convolutional]
size=1
stride=1
pad=1
filters=50
activation=linear[region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match=1 classes=5 coords=4 num=5 softmax=1 jitter=.2 rescore=1
object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1
absolute=1 thresh = .6 random=1 `` Note change of random, classes, and filters. Thats all I really changed.
Thanks
.cfg
looks all rightOkay I will refork and relabel the images, and get back to you on this if it works
Also I tried attempting to run the imagenet1k dataset and it said error out of memory? How can I avoid this issue?
Try to increase subdivison=64
in your cfg-file.
Thank you so much, it fixed the issue of it heading to negative infinity. But how long should it take to reach average loss of 0.06. Right now its just dipping down to 0.12 and then back up to 0.16 and just shuffling there. at 3000 count.
This need not necessarily be 0.06. avg_loss just has to stop noticeably decreasing, then you can stop training.
Okay so it trained IOU average loss to approx 0.14 but when I run the detector even at -thresh 0.01 it outputs nothing...
darknet.exe detector train data/obj.data tiny-yolo-obj.cfg darknet19_448.conv.23
tiny-yolo-obj.cfg
diff:
-thresh 0.02
: darknet.exe detector test data/obj.data tiny-yolo-obj.cfg tiny-yolo-obj_3000.weights -thresh 0.02
@gurmeetsidhu
cuDNN
-library?Also if I run it with a threshold of 0. I get results back which makes me believe that there is something gravely wrong with how it was trained. I dont understand what's going on. Here is a sample input image
Also here is how I have my train.txt and folder setup:
All images are returned with a confidence of 0%, compared to your camaro's 68%:
And here is the end of the training cycle:
Hopefully that gives you some clarification as to what's going on here ...
Okay I will attempt to retrain it with tiny-yolo and see if issue persists
And here is the end of the training cycle:
Something went wrong, so you got CUDA Error
at the end of training.
I have no idea why that error pops up. Any way I can look at a log and perhaps share the results? Doesn't seem to hinder training before and have got it sometimes on a demo, restart and it seems to fix itself. My guess is some sort of usage elsewhere by the graphics card is leading to a chain reaction.
Have you changed your yolo.c file by any chance to something like this before/after you train to help identify the camaro?
char *voc_names[] = {"stopsign", "yeildsign"}; image voc_labels[CLASSNUM];
here is the original line
char *voc_names[] = {"aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"};
@gurmeetsidhu
No, this was required for old version Yolo v1: https://github.com/AlexeyAB/yolo-windows
Now in Yolo v2 all objects names described in obj.names
to wich referenced obj.data
: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
ughhh ... I don't know why these CUDA errors are occuring, and I can't seem to find a log to debug it ... Have you managed to get Yolo-9000 to work, perhaps I can figure out how to label only the items I want. At this point I just don't know how to train my dataset effectively. Could you send me your camaro/lambo dataset and perhaps I can try training on that and see if the error persists?
@gurmeetsidhu
Unfortunately I can not give my dataset - it's private.
But you can use the standard Pascal Voc dataset: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
Let me give you my two cent.
There is definitely something wrong with your dataset or the file paths. try to use the folder names as Alexey mentioned in the description.
You should avoid too big and too small images in your dataset. if you are training for 416*416
, then slightly bigger than this size is good. something around 450*450
to 600*600
.
Does your system become choppy and very slow during training? then make sure you have at least 8G RAM, or your images are quite big. try to close other software during training.
Also do not install any GPU driver except the one which comes with CUDA. I mean just install the latest CUDA and let it goes.
Okay Vanitor,
In the back of my head I had a feeling it was the imagenet dataset. I've eliminated images that fall below 450450 pixels and above 600600. I had some images that were 1080p so quite likely that they may have led to some ram issues.
I have closed apps when I run and that some crashed were associated with me trying to open a chrome and checking a video. So perhaps that is again the issue. I do have 8gbs of ram and my structure is as alexeys
Thanks for your advice. Currently trying it with a cleansed dataset which is approx half the size. 100 images now per category
All right so its settled on 0.24 average loss. Quite high ... and when I ran detection it found nothing
When i lower threshold to 0.05. I get a few pickups and they're all oranges and it looks like this ...
I think due to eliminating all the images. I was working with a dataset of only 200 images to differentiate 5 classes. But it doesn't seem like I am going to be able to get this to work without spending a significant time finding another 900 or so images per class and labeling them
@gurmeetsidhu
Yes, in my case I had one class and 200 images for training and 30 for validation. I told you the previous tips to solve your errors, otherwise if you want high accuracy, that's another topic.
Yes thank you very much Hesam. I guess this issue is resolved and if I have time for my project I will try to gather a larger database and see if that works.
I don't know why but the training doesn't seem to be working well.
It is gradually descending in the loss function but approaching a meager value of 1.4 then accelerates up to very large numbers 100,000-1,000,000. And mind you I am only at 400 counts, no where close to the suggested approx. 5000 for training 5 classes.
Thank you so much for any advice.
PS. I also turn random=1 and same result