Closed mokoenator closed 3 years ago
batch=64 subdivisions=20
batch/subdivisions - should be integer value.
Try to set 30
instead of 5
and recompile https://github.com/AlexeyAB/darknet/blob/88f28f7fcc8fff88fff6dc90a7b4b5474e9a52ff/src/data.c#L1430
If it doesn't help, then try to set args.threads = 12 * ngpus;
and recompile https://github.com/AlexeyAB/darknet/blob/88f28f7fcc8fff88fff6dc90a7b4b5474e9a52ff/src/detector.c#L152-L153
Does it help?
Thank you for reply!
Try to set 30 instead of 5 and recompile
Originally it was set to 5.
If it doesn't help, then try to set args.threads = 12 * ngpus; and recompile
Recompiled
What command do you use for training?
darknet.exe detector train D:\git_work\yolo-set\yolo_runner\my.data D:\git_work\yolo-set\yolo_runner\yolov3_5l.cfg #log\yolov3_5l_5100.weights -gpus 0 -map
Do you get this issue with yolov3.cfg instead of yolov3_5l.cfg?
Almost default, I tried to run it. My custom model is 15 Class.
[net]
# Testing
# Training
batch = 64
subdivisions = 32
width = 416
height = 416
channels = 3
momentum = 0.9
decay = 0.0005
angle = 0
saturation = 1.5
exposure = 1.5
hue = .1
...
filters = 60
activation = linear
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes = 15
GPU LOAD is now 60
About 7 minutes
It ’s not a good question to ask in this issue, We make the following services. Is yolov3.cfg better than larger models like yolov3_5l.cfg? I also want to select small objects such as shoes and sandals. (About the size of the image below)
Originally it was set to 5.
So did you try 30 and 5? With what value is training faster?
GPU LOAD is now 60 About 7 minutes
Try now yolov3_5l.cfg with batch = 64 subdivisions = 32 width=608 height=608
and args.threads = 12 * ngpus;
What CPU/GPU load and training time do you get?
I also want to select small objects such as shoes and sandals. (About the size of the image below)
Calculate anchors for -width 608 -height 608
, don't use it in cfg, just show me - I will say what sizes of objects in your dataset.
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608
May be better to use yolov3-spp.cfg
with batch = 64 subdivisions = 16 or 32 width=608 height=608
retry...
Same as last time yolov3.cfg
[net]
# Testing
# Training
batch = 64
subdivisions = 32
width = 416
height = 416
channels = 3
momentum = 0.9
decay = 0.0005
angle = 0
saturation = 1.5
exposure = 1.5
hue = .1
...
filters = 60
activation = linear
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes = 15
args.threads = 12 * ngpus; // Ryzen 7 2700X (16 logical cores)
and
static const int thread_wait_ms = 30;
About 7 minutes
args.threads = 12 * ngpus; // Ryzen 7 2700X (16 logical cores)
and
static const int thread_wait_ms = 5;
About 7 minutes
I tried running it with the source I cloned a long time ago Branch message ...
Revision: 3aca0b71666bac0dd5760833aea036e7bd897c8a
Author: AlexeyAB <alexeyab84@gmail.com>
Date: 2019/05/25 0:48:11
Message:
conv-LSTM training speedup
----
Modified: src/conv_lstm_layer.c
Modified: src/image_opencv.cpp
The modified code is detector.c (Edit at the same time)
#ifdef OPENCV
//args.threads = 3 * ngpus; // Amazon EC2 Tesla V100: p3.2xlarge (8 logical cores) - p3.16xlarge
args.threads = 28 * ngpus;
...
if (i % 100 == 0) {
//if (i >= (iter_save + 1000) || i % 1000 == 0) {
iter_save = i;
About 5 minutes
↑↑↑↑↑Anything helpful?
I try to calculate the anchor and run yolov3-spp.cfg. Thank you
So use default args.threads = 28 * ngpus;
and static const int thread_wait_ms = 5;
Just additional 2 yolo layers (in yolov3_5l.cfg) are too slow.
Yes, train yolov3_spp.cfg
If you will use Darknet for Detection rather than other frameworks (TRT, OpenCV-dnn, TF, ...) then you can try to train
yolov4-custom.cfg
https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-custom.cfgI didn't notice there was v4. I will try
Hi @AlexeyAB .
CPU and GPU are not fully utilized.
I am training in the following environment.
yolov3_5l.cfg
CPU:AMD Ryzen Threadripper 2950X 16-Core Processor GPU:RTX TAITAN
A lot of CPUs were running in the version around June of last year It took over 20 minutes to perform 100 iterations. (11,000 training images)
slow in the current version. It takes over 40 minutes to perform 100 iterations. (12,000 training images)
The modified code is detector.c, line 152 and 357.
line152
line357
The message when running darknet.exe is as follows
Is this normal?