piaopiaodedudou commented 6 years ago

hello: I don't have GPU,so I choose the tiny-net to learn Deep Learning,but when i train the net following the introduction of u Git,the avg and loss become nan in 100 or less,the iteration cant't go along.I think there must be something wrong in my cfg file or dataset,here is my cfg:

[net]

Testing

batch=1

subdivisions=32

Training

batch=8 subdivisions=2 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 500200 policy=steps steps=400000,450000 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=1 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=1 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

the class is 1,when i change the learningrate,the number of iteration can increase,but not more than 100,the number of samples is 19 i use the command: darknet_no_gpu.exe detector train data/obj.data yolov3-tiny-sobj.cfg yolov3-tiny.conv.15

AlexeyAB commented 6 years ago

Try to cpmment burn_in=1000 and use

batch=64
subdivisions=2

Also check your dataset by using Yolo_mark: https://github.com/AlexeyAB/Yolo_mark

piaopiaodedudou commented 6 years ago

hello： thank you for your reply！I followed your suggestion，and re-label my data,but it's not work,the avg and loss become -nan in 100 iterations.(it take me some time to train without GPU,I answered a little late) @AlexeyAB

piaopiaodedudou commented 6 years ago

during train I saw loss and avgloss didn't reduce at the same rate,loss reduce faster ,and avgloss reduce a little slower,and loss can increase suddently somtime ,is this normal,,or something wrong with my cfg or dataset? @AlexeyAB

piaopiaodedudou commented 6 years ago

i train only one img as dataset ,this phenomenon still exists,i think there maybe somethng wrong with the code: darknet_no_gpu @AlexeyAB

piaopiaodedudou commented 6 years ago

hello： Sorry to disturb you again. Due to so many problems encountered, I hope to systematically study yolo, can you send related papers or program documentation to me？.my gmail :piaopiaodedudou@gmail.com It would be great if i could get your reply

AlexeyAB commented 6 years ago

@piaopiaodedudou Unfortunately there is no any detailed documentation about Training Yolo, and about source code of Darknet. I didn't test to train Yolo v3 Full or Tiny on CPU (without GPU), because it would take whole my life.

Just try to train Yolo v2 Tiny on CPU, or try to use GPU for Yolo v3 Tiny.

ashokrajagopal68 commented 5 years ago

@piaopiaodedudou If u r still trying to train without GPU I had the same problem while training but by changing random=1 to random=0 in all yolo layers I was able to train for thousands of iterations and get a very low avg. loss with yolov3-tiny.cfg

piaopiaodedudou commented 5 years ago

发自我的华为手机thank u for your reply，i will try later。-------- 原始邮件 --------主题：Re: [AlexeyAB/darknet] I can't train yolov3-tiny (#1068)发件人：nitro97 收件人：AlexeyAB/darknet 抄送：piaopiaodedudou 494101356@qq.com,Mention @piaopiaodedudou If u r still trying to train without GPU I had the same problem while training but by changing random=1 to random=0 in all yolo layers I was able to train for thousands of iterations and get a very low avg. loss with yolov3-tiny.cfg

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

AlexeyAB / darknet

I can't train yolov3-tiny #1068

Testing

batch=1

subdivisions=32

Training