NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

Cannot create a dataset from numpu arrays #84

Closed sensus-sextus closed 9 years ago

sensus-sextus commented 9 years ago

Hello, I'm trying to use Neon for training deep networks using my own data. I'm loading then from csv files to numpy arrays, and then creating a descendant of dataset class.

But i didn't manage to get it working for MLP training. The implementation of get_batch is required but I never found how to make an implementation that works.

Thank you in advance for help Dimitri Nowicki

nervanasys commented 9 years ago

This should be much easier in the latest version with the DataIterator example. Check out the mnist example.

sensus-sextus commented 9 years ago

Thank you very much, it works! But now when I'm trying to modify the network from this mnist example making it "deeper" (with more than 1 hidden layer) it stops learning-- the cost function does not decrease upon epochs.

Can you advice any remedy for this?

nervanasys commented 9 years ago

What cost function are you using ? In general you will have to play with hyperparameters for this.

sensus-sextus commented 9 years ago

For some reason, it stucks for any cost function if the optimizer is not RMSProp. Can you explain this somehow?

ursk commented 9 years ago

It's hard to take a guess as to what could be going on without the details of the network you are trying to run. Some general comments, assuming that this is indeed a hyper-parameter issue and not a bug you have stumbled across: Making the network deeper can easily require retuning the learning rates, random weight initialization, and might not work at all unless the right combination of convolutional, pooling and fully connected layers are used. RMSProp is a very forgiving optimizer and can hide problems to some degree, but of course it's no magic bullet. We can't provide support for tuning custom network configurations, but please have a look at the other examples and try to go from there. If you find similar issues with the example networks, please let us know.

sensus-sextus commented 9 years ago

Thank you for reply. I 'm now just copy-pasting this code below . For some reason something weird happens for any network with more than one hidden layer.

I will be graeatful for any advice regarding this network. If you wish, you can write me directly sensus.sextud@gmail.com Thanks again, Dimitri

from numpy import *
from numpy import genfromtxt
import logging
import os

from neon.backends import gen_backend
from neon.callbacks.callbacks import Callbacks
from neon.data import DataIterator, load_mnist
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, BatchNorm
from neon.models import Model
from neon.optimizers import GradientDescentMomentum, RMSProp
from neon.transforms import Rectlin, Logistic, Tanh, Softmax, CrossEntropyBinary, SumSquared, Misclassification
from neon.util.argparser import NeonArgparser

logger = logging.getLogger()

# parse the command line arguments
parser = NeonArgparser(__doc__)
parser.add_argument('--serialize', nargs='?', type=int,
                    default=0, const=1, metavar='N',
                    help='serialize model every N epochs')
parser.add_argument('--model_file', help='load model from pkl file')

args = parser.parse_args()

# hyperparameters
batch_size = 1000
numclass= 20
num_epochs =200

# setup backend
be = gen_backend(backend=args.backend,
                 batch_size=batch_size,
                 rng_seed=args.rng_seed,
                 device_id=args.device_id,
                 default_dtype=args.datatype,
                 stochastic_round=False)

# load up the mnist data set
# split into train and tests sets
#(X_train, y_train), (X_test, y_test), nclass = load_mnist(path=args.data_dir)

comboinput = genfromtxt('cinput_vyb48.csv', delimiter=';')# input vector
retbin = genfromtxt('retbin_vyb48.csv', delimiter=';')#
X_train =comboinput;
X_test = comboinput;
Y_train =retbin; # np_utils.to_categorical(y_train, nb_classes)
Y_test = retbin; #np_utils.to_categorical(y_test, nb_classes)

#X_train = X_train.astype("float32")
#X_test = X_test.astype("float32")
#Y_train = Y_train.astype("float32")
#Y_test = Y_test.astype("float32")
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# setup a training set iterator
train_set = DataIterator(X_train, Y_train, nclass=numclass)
# setup a validation data set iterator
valid_set = DataIterator(X_test, Y_test, nclass=numclass)

# setup weight initialization function
init_norm = Gaussian(loc=0.0, scale=0.01)

# setiup model layers
layers = []
layers.append(Affine(nout=80, init=init_norm, activation=Tanh()))
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
#layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
#layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(Affine(nout=20, init=init_norm, activation=Softmax()))
#shortcut=True
# setup cost function as CrossEntropy
cost = GeneralizedCost(costfunc=SumSquared())

# setup optimizer
llrate= 2e-4
#optimizer = GradientDescentMomentum(llrate, momentum_coef=0.7, stochastic_round=args.rounding)
optimizer =RMSProp(stochastic_round=False, decay_rate=0.95, learning_rate=llrate, epsilon=1e-06, clip_gradients=False, gradient_limit=5, name='rmsprop')
# initialize model object
mlp = Model(layers=layers)

if args.model_file:
    assert os.path.exists(args.model_file), '%s not found' % args.model_file
    logger.info('loading initial model state from %s' % args.model_file)
    mlp.load_weights(args.model_file)

# setup standard fit callbacks
callbacks = Callbacks(mlp, train_set, output_file=args.output_file,
                      progress_bar=args.progress_bar)

# add a callback ot calculate

if args.validation_freq:
    # setup validation trial callbacks
    callbacks.add_validation_callback(valid_set, args.validation_freq)

if args.serialize > 0:
    # add callback for saving checkpoint file
    # every args.serialize epchs
    checkpoint_schedule = args.serialize
    checkpoint_model_path = args.save_path
    callbacks.add_serialize_callback(checkpoint_schedule, checkpoint_model_path)

# run fit
mlp.fit(train_set, optimizer=optimizer, num_epochs=num_epochs, cost=cost, callbacks=callbacks)

print('Misclassification error = %.6f%%' % (mlp.eval(valid_set, metric=Misclassification())*100))
ursk commented 9 years ago

Dimitri, we cannot run your code without access to 'cinput_vyb48.csv', but it's pretty likely that a network with six tanh layers and a fixed Gaussian(loc=0.0, scale=0.01) initialization is going to suffer from vanishing gradients and won't be able to learn. You may want to try with ReLu nonlinearities (http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf), with a more elaborate initialization scheme (http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization) or you can insert batch normalization layers between the tanh layers, like this:

layers = []
layers.append(Affine(nout=80, init=init_norm, activation=Tanh()))
layers.append(BatchNorm())
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(BatchNorm())
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(BatchNorm())
layers.append(Affine(nout=64, init=init_norm, activation=Tanh()))
layers.append(BatchNorm())
layers.append(Affine(nout=20, init=init_norm, activation=Softmax()))

any of those should get the network unstuck.

sensus-sextus commented 9 years ago

Thanks again, it works much better with Xavier now! Sorry, but now I'm stuck running fprop to get network predictions. This code: X0= X_test #[0:999,:] xx= be.array(X0)

prdc= mlp.fprop(xx)

produces the following error:

File "D:/Dima/Python/neonka/bloo_neon5.py", line 157, in prdc= mlp.fprop(xx)

File "D:\Python27\lib\site-packages\neon-1.0.0rc1-py2.7.egg\neon\models\model.py", line 173, in fprop x = l.fprop(x, inference)

File "D:\Python27\lib\site-packages\neon-1.0.0rc1-py2.7.egg\neon\layers\layer.py", line 422, in fprop self.be.compound_dot(A=self.W, B=inputs, C=self.outputs)

File "D:\Python27\lib\site-packages\neon-1.0.0rc1-py2.7.egg\neon\backends\nervanacpu.py", line 763, in compound_dot assert B.shape[1] == C.shape[1] I have no ideas how to fix it?