AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Can I train it with pre-trained weights and multi-GPU? #5012

Open MrCuiHao opened 4 years ago

MrCuiHao commented 4 years ago

@AlexeyAB,Hello

I watched your edited README.md file ,you said:

1、Train it first on 1 GPU for like 1000 iterations: 
darknet.exe detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

2、Then stop and by using partially-trained model /backup/yolov3-voc_1000.weights
 run training with multigpu (up to 4 GPUs): 
darknet.exe detector train cfg/voc.data cfg/yolov3-voc.cfg /backup/yolov3-voc_1000.weights -gpus 0,1,2,3

Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate = 0.00025 (i.e. learning_rate = 0.001 / GPUs). In this case also increase 4x times burn_in = and max_batches = in your cfg-file. I.e. use burn_in = 4000 instead of 1000. Same goes for steps= if policy=steps is set.

Can I directly train it with pre-trained convolutional weights in multi-GPU? the training command is as follows: .\darknet.exe detector train coco.data csresnet50-panet-spp.cfg Pretrained-Convolutional-Weights\csresnet50-panet-spp.conv.112 -gpus 0,1,2,3 -map

And ,another question, I have 119508 traing dataset and 26766 val dataset, so ,should I changed the learning_rate value to learning_rate = 0.001 / GPUs, and so on ........

AlexeyAB commented 4 years ago

Can I directly train it with pre-trained convolutional weights in multi-GPU? the training command is as follows:

Yes.

And ,another question, I have 119508 traing dataset and 26766 val dataset, so ,should I changed the learning_rate value to learning_rate = 0.001 / GPUs, and so on .......

This is just a recommendation. So you can keep default LR.

USNA2014 commented 4 years ago

@AlexeyAB How do we use our own weights? I'm using the following but the cmd line output only shows all of the layers then immediately saves the final file without even training. Here is the cmd I input: darknet.exe detector train x64/data/obj.data x64/yolov3-openimages.cfg x64/yolov3-openimages.weights.

I don't want to use darknet53.conv.74 since it's trained on imagenet instead of COCO.

USNA2014 commented 4 years ago

@AlexeyAB disregard, I just figured it out. I shouldn't forget about first principles based troubleshooting XD. So I noticed when I run yolov3-openimages.weights vs. darknet53.conv.74

image image

the yolov3-openimages.weights already had 500,000 batches trained (500kilo-batches) whereas the darknet53.conv.74 had 0 kilo batches. I know that darknet53.conv.74 weights are pretrained on imagenet, but I believe there is something you manipulated to make it start at 0 kilobatches (my assumption is so users can readily use it and proceed to shake & bake). Is there a way I can manipulate my weights file to show 0 as well? Not a big deal but would be interesting to know. In any case, if I just alter my cfg file to allow a max of 506,000 batches (my desired is 6000), it allows me to train.

Your exe is awesome, just difficult to follow reasoning for the instructions at times. Thanks!