problem in training - Githubissues

apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

https://mxnet.apache.org

Apache License 2.0

20.76k stars 6.8k forks source link

problem in training #885

Closed achao2013 closed 8 years ago

achao2013 commented 8 years ago

when i train a network，the accuracy keeps falling down from the begin of each epoch，why？

gujunli commented 8 years ago

Is it accuracy or loss? Loss goes down. Thanks. junli

On Wed, Dec 9, 2015 at 6:05 PM, achao2013 notifications@github.com wrote:

when i train a network，the accuracy keeps falling down from the begin of each epoch，why？

— Reply to this email directly or view it on GitHub https://github.com/dmlc/mxnet/issues/885.

Junli Gu--谷俊丽 Coordinated Science Lab University of Illinois at Urbana-Champaign

achao2013 commented 8 years ago

the full name is “Train-accuracy”，i don't know how to display the loss yet

tqchen commented 8 years ago

please be a bit more specific, e.g. what is the configuration you use which could be helpful to give others more context. Usually accuracy goes down when you have bad initialization, or too large learning rate.

tqchen commented 8 years ago

closing due to inactive status, please feel free to reopen

achao2013 commented 8 years ago

@tqchen ,I use the demo of the cifar-100.ipynb, for the classfication task of 146 classes. I don't modify any configuration except input data(batch size=32 for memory limitation). The accracy keeps falling down from 90% to 54% and begin to shock weakly around 55%.

Moreover, when i set another net (34layers ResNet of MSRA), the same problem is happened, the data is ilsvrc2012 and the accuracy decrease from 19% to 1% and keeps decrease up to now.I try many configuration and the result is the same. The current params are as follows: model_args={} model_args['clip_gradient']=5 model_args['lr_scheduler']=mx.lr_scheduler.FactorScheduler(step=50000,factor=0.5) model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch, learning_rate=0.01, momentum=0.9, wd=0.0001,initializer= mx.init.Xavier(rnd_type='gaussian',factor_type="in", magnitude=2.34),arg_params=model_args)

tqchen commented 8 years ago

if you use a smaller batch size, it is likely you need to re-tune your parameter with a smaller learning rate

achao2013 commented 8 years ago

@tqchen I try some learning rate, The speed of decreasing go down slightly, and the train accuracy pan up， but it is still a general downward trend. I'm a new user， and i haven't find the code for calculating the train accuracy，but i conjecture the train accuracy will include more and more samples as the batch num inclease. I don't know if this situation is correct or not because i have encountered this when the data labels are wrong in caffe.(it's right here)