Open HanSeYeong opened 5 years ago
YOLOv3 608 means width&height=608, the best way to increase result should be to increase the width&height parameters, I think that you could even use subdivisions =32 in order to get the best width&height possible. also try to create a validation set and set it in the configuration file in order to get better understanding of your results during the training, loss is not enough most of the time.
try the label:Explanations in the issues section to get most of the YOLO bible directly from the mentor ;)
@HanSeYoung Hi,
RTX Titan 24GB(TITAN) is a good choise!
I think insufficient number of training images is the most important thing to get the highest accuracy (mAP) on the Test images in your case.
My question From the 'When should I stop the training?'
- Are my data are too small to get a good training result?
- Which is better choice between decreasing subdivisions and increasing width&height for getting less loss? I want to do both of them but I can't because of lack memory.
- Is my image resolution too big a problem? I think the lack of memory can be caused by the HD images but is that a fact?
- How many iterations did you train the COCO datasets to get the amazing YOLOv3 608 weights? And what was your subdivision and lowest loss?
- The documentation suggest (2000 * classes) is good for getting a good model but mine was not... Is my case abnormal?
The more images - the better. For good result, you should have ~8000 images for 4 classes. Also disarable to have another 8000 images with backgrounds (without objects) with empty txt-files. But it is not always possible to collect as many images. So just try to find more, the more - the better. Or just try to find about +3000 images for Training (without objects, with backgrounds that will be in your test images).
width and height are more important than subdivision for accuracy mAP.
2.1. Accuracy (mAP) is more important than Loss. Also accuracy (mAP) on Validation dataset (images that aren't used for training) - even more important. Validation dataset should contain at least 800 images that are not in Training dataset. (image number ratio: Training / Validation = from 5%/95% to 20%/80%).
2.2. To get the lowest (fictive) loss you can achive by using 1 training image without labled objects and with disabled data augmentation: random=0 jitter=0 hue=0 exposure=1 saturation=1
...
If you can detect objects and distinguish different classes of objects by yourself by eye after each image has been resized to the 544x544 - then this network resolution (width=544 height=544 in cfg) is enough.
To understand, what network resolution (width= height= in cfg) is required - try to run this command
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 544 -height 544 -show
And show me screenshot of cloud of points (size of training objects after resizing each image to the network size 544x544).
subdivisions=16 and train ~500 000 iterations using 4 GPUs: https://github.com/AlexeyAB/darknet/blob/97038fefa63b9b14c0b2fd0c4f1b382db7b43e8c/cfg/yolov3.cfg#L20 On the hard and large datasets like MS COCO, the final Loss can be ~1.0-4.0
Depends on the variability of your test images. If your training dataset includes images with all possible objects and backgrounds, that are possible in your detection images/videos, then you have enough Training images. For example, for the case of detecting objects moving on a mechanical conveyor, only 200 images per class can suffice - If the training images were taken from the same camera with which the detection will take place: from the same position, at the same distance, with the same illumination, at the same angle.
trained with CUDNN_HALF=0 and changed with 1 after 1000 iterations
Use the latest version of this repository with CUDNN_HALF=1, it automatically disables FP16 Tensor Cores for the first 3000 iterations = (burin_in * 3), and automatically enables after 3000 iterations.
Also it disables FP16 Tensor Cores for the 1st layer, that is the most sensible for FP32/FP16.
What did I do for shrinking the loss of training:
subdivisions = 16 -> 8 width&height = 544 calculated anchors more labeled images 700 -> 2700 more labeled boxes 1528 -> 8214 random = 1 trained with CUDNN_HALF=0 and changed with 1 after 1000 iterations buy a TITAN RTX because of the lack of memory for training
All of this will imporve accuracy (mAP) that is more important than Loss, but will not necessarily improve the loss.
Also if you train the model on computer with connected monitor, then try to train using flag -map
command:
./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74 -map
So you will see mAP-chart during training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training
Result 31254 iterations the lowest loss was 0.59 mean average precision (mAP) = 0.648612, or 64.86 %
Also try to decrease 10x time learning_rate=
and train more after 31254 iterations, the avg loss can decrease more (and mAP will increase more)
I really appreciate with your answers @HagegeR and @AlexeyAB !!!
Get more training data is the best solution but I can't do that anymore...
More detail about my data
I only need the highest class_id=1
accuracy.
So I labeled class_id=0, 2, 3 which can cause confusing when detecting class_id=1.
I followed your suggestions and changed below parameters.
batch = 64
subdivisions = 16
width = 736
height = 736
learning_rate=0.001 (I've not changed this to 0.0001 yet)
max 22GB ram was used.
I had split my data with 2772 train images and 100 test images.
I thought that ratio was the biggest problem in my training.
So I changed the ratio by 2315 train images and 457 test images
which ratio is 80%/20%.
darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 736 -height 736 -show
I didn't change random=0 jitter=0 hue=0 exposure=1 saturation=1
.
Training Results ( iterations=12000) Lowest loss = 0.78
I only need the highest class_id=1
accuracy.
But the result was not increased rather than prior training.
I'll run more iterations to get more higher accuracy of class_id=1
.
My questions
Can you explain what are those cloud of points meaning?
Is there any more suggestion about my training? The loss was not decreased after 10000 iterations. Should I wait more iterations? or Is there anything that I can change in this situation?
When I train the YOLO in ubuntu, I couldn't see the map even I start with ./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74 -map
. So I changed to windows10 for watching the map. Is there any bug with the showing graph?
Oh should I install matplotlib to see the map graph?
I surprised that you used 4 gpus to train YOLO model. Was the connection 4-way SLI or parallel connection?
I installed Darknet in Windows10 and yolo_cpp_dll.dll
. Can I reuse the yolo_cpp_dll.dll
which has same RTX series GPU? or should I remake that? I configured that I can't reuse that in 10 series gpu.
Thanks again that you spent your precious time in answering my questions.
First I really appreciate with your beautiful YOLO v3.
I think the lack of 11GB vram with my 2080ti was the problem that I can't decrease the subdivision and increase width&height. So I bought a TITAN RTX 24GB graphic card which has enormous vram!!
My GPU is Titan RTX 24GB vram
My configuration for training YOLOv3 is
Above configurations are using maximum 22GB of vram. (When I use the subdivisions=4 and width=416, lack of memory error was occurred.... My titan rtx dream was gone...)
My data 2770 images with 8214 labeled boxes. 4 classes -> each of class is labeled 5047, 2490, 615, 62. train image resolution : 1280x720 not small objects to detect
Result 31254 iterations the lowest loss was 0.59
I am sorry that I can't show the classes because of my secret project... I tested with images but the result was same with above accuracy.
What did I do for shrinking the loss of training:
My question From the 'When should I stop the training?'
I think my question can be helpful who wants to train the best custom labeled image model.
Sharing some experiences All the below results are trained by 1280x720 resolution images.
Training speed(per 1000 epochs)
Vram usage