dnouri / nolearn

Combines the ease of use of scikit-learn with the power of Theano/Lasagne
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
MIT License
949 stars 260 forks source link

Can't run a classifier #178

Closed ivallesp closed 8 years ago

ivallesp commented 8 years ago

Hello

My name is Iván, I'm stuck from several days ago with the problem I'm going to describe. I'm following the Daniel Nouri's tutorial about deep learning: http://danielnouri.org/notes/category/deep-learning/ and I tried to adapt his example to a classification dataset. My problem here is that if I treat the dataset as a regression problem, it works properly, but if I try to perform a classification, it fails. I tried to write 2 reproducible examples.

1) Regression (it works well)

import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data[iris.target<2]  # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.float32)

print X.shape, type(X)
print y.shape, type(y)

net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(None, 4),  # 96x96 input pixels per batch
    hidden_num_units=10,  # number of units in hidden layer
    output_nonlinearity=None,  # output layer uses identity function
    output_num_units=1,  # 1 target value

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=True,  # flag to indicate we're dealing with regression problem
    max_epochs=400,  # we want to train this many epochs
    verbose=1,
    )

net1.fit(X, y)

2) Classification (it raises an error of matrix dimensionalities; I paste it below)

import lasagne
from sklearn import datasets
import numpy as np
from lasagne import layers
from lasagne.nonlinearities import softmax
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data[iris.target<2]  # we only take the first two features.
Y = iris.target[iris.target<2]
stdscaler = StandardScaler(copy=True, with_mean=True, with_std=True)
X = stdscaler.fit_transform(X).astype(np.float32)
y = np.asmatrix((Y-0.5)*2).T.astype(np.int32)

print X.shape, type(X)
print y.shape, type(y)

net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(None, 4),  # 96x96 input pixels per batch
    hidden_num_units=10,  # number of units in hidden layer
    output_nonlinearity=softmax,  # output layer uses identity function
    output_num_units=1,  # 1 target value

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=False,  # flag to indicate we're dealing with classification problem
    max_epochs=400,  # we want to train this many epochs
    verbose=1,
    )

net1.fit(X, y)

The failed output I get with the code 2.


(100, 4) <type 'numpy.ndarray'>
(100, 1) <type 'numpy.ndarray'>
  input                 (None, 4)               produces       4 outputs
  hidden                (None, 10)              produces      10 outputs
  output                (None, 1)               produces       1 outputs
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-13-184a45e5abaa> in <module>()
     40     )
     41 
---> 42 net1.fit(X, y)

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in fit(self, X, y)
    291 
    292         try:
--> 293             self.train_loop(X, y)
    294         except KeyboardInterrupt:
    295             pass

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_loop(self, X, y)
    298     def train_loop(self, X, y):
    299         X_train, X_valid, y_train, y_valid = self.train_test_split(
--> 300             X, y, self.eval_size)
    301 
    302         on_epoch_finished = self.on_epoch_finished

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/nolearn/lasagne/base.pyc in train_test_split(self, X, y, eval_size)
    399                 kf = KFold(y.shape[0], round(1. / eval_size))
    400             else:
--> 401                 kf = StratifiedKFold(y, round(1. / eval_size))
    402 
    403             train_indices, valid_indices = next(iter(kf))

/Users/ivanvallesperez/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self, y, n_folds, shuffle, random_state)
    531         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    532             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 533                 label_test_folds = test_folds[y == label]
    534                 # the test split can be too big because we used
    535                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

What is going on here? Am I doing something bad? I thing I tried everything but I am not able to figure out what is happening.

Note that I just updated today my dependencies using the command: pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt

Thanks in advance

Edit

I achieved to make it work by performing the subsequent changes but I still have some doubts:

I also tried to change the cost function to a ROC-AUC. I know there's a parameter called objective_loss_function which is defined as objective_loss_function=lasagne.objectives.categorical_crossentropy by default but... how can I use the ROC AUC as the cost function instead of the categorical crossentropy?

Thanks

BenjaminBossan commented 8 years ago

Hi Iván,

regarding your first question, since you perform a softmax function on your output layer, the values have to sum up to 1 for each row. That is why you need 2 outputs. It is a little unintuitive but you figured out what to do.

Regarding roc auc, there is no straightforward implementation of that cost function, since you cannot really differentiate it, which is necessary to perform backprop. There are proxy metrics for roc auc, but from my experience, they are not worth the trouble. The main reason is because they are much more unstable than cross-entropy and did not lead to other results at the end.

Hope that helps

ivallesp commented 8 years ago

Hi,

It helps a lot, thank you. So I need 2 outputs because I am using softmax? I tried to fix output_nonlinearity to None and the same error raises… Can you modify my example so that it works with only one output unit?

Thank you!! Iván

El 21 nov 2015, a las 18:16, Benjamin Bossan notifications@github.com escribió:

Hi Iván,

regarding your first question, since you perform a softmax function on your output layer, the values have to sum up to 1 for each row. That is why you need 2 outputs. It is a little unintuitive but you figured out what to do.

Regarding roc auc, there is no straightforward implementation of that cost function, since you cannot really differentiate it, which is necessary to perform backprop. There are proxy metrics for roc auc, but from my experience, they are not worth the trouble. The main reason is because they are much more unstable than cross-entropy and did not lead to other results at the end.

Hope that helps

— Reply to this email directly or view it on GitHub https://github.com/dnouri/nolearn/issues/178#issuecomment-158662968.

BenjaminBossan commented 8 years ago

I don't think it is possible to make it run with just one output unit, at least not with theano's included cross-entropy function. But you could define your own cost function that accepts 1D input, though I don't see why you absolutely need 1D.

ivallesp commented 8 years ago

Well, I don't absolutely need 1D, but I think it's the most correct for a binary classification problem. Correct me if I'm wrong but it would be better if we had only one output because we could define a probability of getting a 1 in the output, and the probability of 0 would be easily calculated as 1-the probability of 1. Am I wrong?

BenjaminBossan commented 8 years ago

You are right that the second column is redundant but then I believe that sklearn also returns two columns for binary classifications tasks, so nolearn is in line there.

jmwoloso commented 8 years ago

@BenjaminBossan just to clarify. If we are trying to do binary classification for one class, which class does the first probability in a row of softmax predictions represent? p(x=0) or p(x=1)?

jmwoloso commented 8 years ago

Nevermind, I believe I have figured it out. The first prediction in the output is p(x=0) with the second one being p(x=1).