Open damnko opened 5 years ago
@damnko I see that you always have count = 0. Perhaps check if all your training images are correctly labeled and configured (e.g. in same directory)?
Also, feel free to start training from weights/yolo-drone.weights
instead of from darknet53.conv.74
.
Hi @chuanenlin thanks for your prompt reply. Yes, the images and labels are both in the same folder and I've downloaded everything from your repo, I didn't do the labeling myself, so I suppose they're correct.
That count
should indicate the number of times the object is detected in a specific region for that image? What should I expect to see regarding that number so I can check on the full log?
I will try to start training from your weights, but I suppose it should work anyway, starting from darknet53.conv.74
correct?
@damnko Yes, therefore count = 0
should indicate either some kind of error (e.g. labeling) or that it needs more iterations of training. You should expect to see several lines of non-NAN values for IOU, Class, etc. and count > 0 after decent training iterations and correct configuration.
If you start from darknet53.conv.74
, it would mean starting from "scratch" (take note that I trained for a couple of days on a NVIDIA Tesla GPU) while starting from weights/yolo-drone.weights
would mean starting from where I left off.
Oh, a couple of days... But were you training the standard, non-tiny, version right? But then what does 1930: 0.000000
indicate? Shouldn't that be the loss you also refer to in the blog post?
@damnko That's correct. FYI, feel free to open an issue here and look if anyone has also faced a similar issue.
Ok thanks, I will investigate a little bit more and then will open an issue on the darknet repo. Thanks for now, I will keep you posted and come back to close this issue.
Hi @chuanenlin , sorry to ask you again. But I've tried to follow another tutorial with a different dataset and it worked, now I'm trying the same settings, with your images/labels and I'm having the same problem.
The labels of the other tutorial (which works) is in the following form:
0 0.469603 0.48314199999999996 0.797766 0.795552
While your label has this format:
0 167.5 116.5 175.0 157.0
So, the first seems to have relative positions/dimensions, and yours absolute. Is there something wrong, or both should work? I wonder if the problem is there. Thanks again for your help
@damnko May I ask which repository for YOLO are you using? Some variants such as AlexeyAB's version require different labeling formats, such as the one you mentioned. The labels in this repo should work with the original (pjreddie's) version.
This is the version I compiled: https://github.com/pjreddie/darknet
That's odd - I trained the weights with the labels in this repo so the formatting should be fine. 🤔
I don't know, even when using labelimg for image labeling following YOLO format they look like this...
0 0.534304 0.380769 0.640333 0.584615
Feel free to close the issue since as for now, I think, my problem is solved using this labeling, even though I have not tried to convert your labels in this format.
Thanks for your feedback and thanks again for sharing your work :pray:
@damnko What OS are you on? I am having a similar problem with normal YOLO. Also, what was the other tutorial?
Hi @trigaten , I was using Linux Mint. It was some time ago and can't remember the other tutorial I was looking at but I guess it was this one: https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/
Hi, I've followed your tutorial in order to train Tiny YOLOv3 with drone images. After ~1900 iterations I'm still not getting any detection even on training images, which is strange. I've tested with your weights and everything works, so there must be some problem during my training procedure.
Darknet has been compiled with GPU and the training has been done on Google Colab. Here is the output during the training run with the command
./darknet detector train drone.data cfg/yolov3-tiny-drone.cfg darknet53.conv.74
:Do you notice anything strange that might be related to the problem I'm facing? Thank you so much for your help and for sharing the tutorial
Edit Here is a link to the avg loss plot of another run of training which gives the same problem: https://imgur.com/8IG3yO4