dnouri / nolearn

Combines the ease of use of scikit-learn with the power of Theano/Lasagne
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
MIT License
948 stars 260 forks source link

Cannot match validation loss from training when calculating after training #49

Closed run2 closed 8 years ago

run2 commented 9 years ago

Hi

I am facing this strange problem - which I am struggling for ages now.

I am training a net - using images and labels. So my regression is false and uselabelencoder is true and in train, test split, the StratifiedKFold gets invoked etc.

Now, I am using EarlyStopping. When my net trains and exists out due to early stopping, the best weights are loaded into the net (as in the example for face rectangles). I can see the validation loss on that best epoch

After training is finished, I use the net and get the validation set again - using the same StratifiedKFold logic (I made sure that the indices and labels are exactly the same of the validation set used inside the training loop). I now use predict_proba (I have tried with _output_layer.get_output too). and a normal numpy method to get the loss (negative log likelihood) - and the validation loss is different from the best validation loss during training. And that difference is very big (I understand there might be some decimals off here and there for stuff being calculated on the GPU). It seems - the loss I am getting is more close to the last loss I see during the training. Not the best loss.

Now - just to add some points 1) I have made sure there is no transformation or augmentation of images in real time - both while training and predicting 2) I have made sure that the code I am using to calculate the loss is exactly the same as the negative_log_likelihood function. 3) I have made sure that the weights are being copied over only when (and finally once at the least) the valid loss is decreasing.

Can some one tell me what I am missing here ? or is there some problem lurking somewhere ?

Thanks Regards

dnouri commented 9 years ago

Hmm, maybe it'd help to see some code. Number 3 isn't entirely clear to me. What does your EarlyStopping implementation look like?

run2 commented 9 years ago

Here it is. There are some thing which I do with the more_params thing - but rest is almost same as your code.

class EarlyStopping(object):

    def __init__(self, patience=100):
        self.patience = patience
        self.best_valid = np.inf
        self.best_valid_epoch = 0
        self.best_weights = None

    def __call__(self, nn, train_history):
        if(bool(nn.more_params) and 'reset' in nn.more_params and nn.more_params['reset'] == 1):
            self.best_valid = np.inf
            self.best_valid_epoch = 0
            self.best_weights = None
            nn.more_params['reset'] = 0
            #print 'Patience is set at ' + str(self.patience)
            #print 'Max epochs is ' + str(nn.max_epochs)

        current_valid = train_history[-1]['valid_loss']
        current_epoch = train_history[-1]['epoch']
        #print str(current_epoch)
        #if(current_epoch%100==0):
        #    print("Saving state.")
        #    print("Best valid loss was {:.6f} at epoch {}.".format(
        #        self.best_valid, self.best_valid_epoch))
        #    nn.load_weights_from(self.best_weights)
        #    with open('models/' + current_epoch + '.model', 'wb') as f:
        #        pickle.dump(nn, f, -1)

        if current_valid < self.best_valid:
            print 'Ressing best'
            self.best_valid = current_valid
            self.best_valid_epoch = current_epoch
            self.best_weights = [w.get_value() for w in nn.get_all_params()]
            nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
        if (self.best_valid_epoch + self.patience < current_epoch):
            print("Early stopping.")
            print("Best valid loss was {:.6f} at epoch {}.".format(
                self.best_valid, self.best_valid_epoch))
            nn.load_weights_from(self.best_weights)
            nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
            raise StopIteration()
        elif (current_epoch == nn.max_epochs):
            print("Loading best weights")
            nn.load_weights_from(self.best_weights)
            nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
run2 commented 9 years ago

So - I can see the "Loading best weights" being printed - only when the valid loss decreases and - also at the max epoch (if it was a smooth decrease till the max epoch). You will also notice that I have got the Resetting bit (sorry that is a typo there as Ressing) in its own if - such that the load weights is independent of that if (which I believe is right).

Point - is - I get the weights loaded correctly from the best weights - whether max epoch or patience override. But, when I use that net, coming out of the .fit() call, it gives me a different loss on the same validation set.

Here is how I am calculating the loss

def get_log_loss(y_actual, y_pred):
    y_actual = y_actual.reshape(y_actual.shape[0])

    vec_actual = np.zeros(y_pred.shape)
    sizeOfSet = vec_actual.shape[0]
    vec_actual[np.arange(sizeOfSet), y_actual.astype(int)] = 1

    loss_sum = np.sum(vec_actual * np.log(y_pred))
    loss = -1.0 / sizeOfSet * loss_sum
    return loss
dnouri commented 9 years ago

If you're not sure that you're calculating the loss right, maybe you should try and call your numpy version and the Theano version used by the net with the same values, and verify that they produce the same output.

Here's an implementation that I have lying around:

import scipy as sp

def logloss(y_true, y_pred):
    epsilon = 1e-18
    y_pred = sp.maximum(epsilon, y_pred)
    y_pred = sp.minimum(1 - epsilon, y_pred)
    ll = (sum(y_true * sp.log(y_pred) +
              sp.subtract(1, y_true) *
              sp.log(sp.subtract(1, y_pred)))
          )
    ll = ll * -1.0 / len(y_true)
    return ll
run2 commented 9 years ago

Daniel - There is some problem somewhere. It would be great if you can validate the best val loss - from training w.r.t same loss after training - on any non regression net you have with you. If you get the same value then definitely I have mucked up somewhere. If not then - there is something not quite right somewhere. I am working on it too (few days now :()

dnouri commented 9 years ago

@run2: Yes, I'm doing this on a classification net and it's giving me consistent results. Have you checked that your get_log_loss function is right?

run2 commented 9 years ago

Daniel - the log loss by default for non regression problem is return -T.mean(T.log(output)[T.arange(prediction.shape[0]), prediction]) Does that no equate with the get_log_loss code I have pasted above ?

y_pred is [nsamples,nclasses] (2D array) from predict_proba y_actual is [nsamples,] 1D array of actual class labels

dnouri commented 9 years ago

I just tried to reproduce this issue. There's a test called test_lasagne_functional_mnist, and I added this bit of code right after the line assert accuracy_score...:

    # assert accuracy_score(y_pred, y_test) > 0.85 ...

    from nolearn.lasagne import negative_log_likelihood
    X_train, X_valid, y_train, y_valid = nn.train_test_split(
        X_train, y_train, nn.eval_size)
    y_pred = nn.predict_proba(X_valid)
    loss = negative_log_likelihood(y_pred, y_valid).eval()
    assert abs(nn.train_history_[-1]['valid_loss'] - loss) < 0.01

So looks like it's matching up for this small example. Any more ideas?

run2 commented 9 years ago

let me try that.

run2 commented 9 years ago

So this is getting more tricky 1) I printed from within the train_test_split method, the size of my valid set. And it printed as 5509. 2) I printed the size of y_pred after calling predict_proba (as in your code) and I got shape[0] as 5500. So it had skipped 9 examples. Note my batch size is 20. That by itself can be a cause of the difference. But I am sure that is not the only reason 3) I could not get any further than that as your method failed with raise TypeError('index must be integers') in File "/home/debanjan/pythonrepos/Theano/theano/tensor/subtensor.py", line 1980, in as_index_variable. Though I checked the y_pred and y_valid variables and they seemd to be fine. It is not for the sizes not matching - I checked that. 4) May be a label encoding problem is lurking somewhere. I have printed the y_valid from within train_test_split - when it ran your code above - I got some labels - they were one more than the same print statement which fired during training. Remember, I am using label encoder - and my labels are integers - but starting from 1 (there is no 0). Thats the reason it is one more when calling predict_proba directly 5) It is possible that all labels may not be present in train and valid sets. Hmm. I am just loudly thinking..

Makes any sense ?

run2 commented 9 years ago

Ah!! - finally it matches Daniel!. God I spent days on this. So I think two reasons (I am still running some more tests)

1) the validation loss during training is skipping examples to fit the batch size. I would have thought, it should pad it (like n+p) and then while calculating the loss it should take the first n of the results and dimiss the last p. If I have a large batch size, this might cause quite a difference

2) while using label encoder, you need to make sure that while predicting on a test set, or validation set, the labels are encoded too. This is something I had already done - so that was not the problem

3) the method I have written does not equate the same result as the negative_log_likelihood result from theano. The problem mentioned above was sorted by type casting y_pred to np.int32. So I got a result from your code. But it was 0.006 off from the result from my code. I am not too happy to see that difference. I am running further tests to see how bad that difference can be. For some reason, when I execute your code for log loss, it gives me back an array, not a value. I will check again tomorrow after I catch some sleep.

run2 commented 9 years ago

Ok - I am wrong - it matches - but with the last validation loss - not the best validation loss. I am still clueless - why it is not matching the best validation loss even though the right weights are being copied (from the best validation loss)

run2 commented 9 years ago

Ok - Daniel. I have just solved this issue and it is a Bug

You need to check if _output_layer is None before you initialize layers in load_weights_from. Right now - I am figuring out why - but if you have the code as below - it thinks it has loaded the weights - but it has not.

def load_weights_from(self, source):
    self._output_layer = self.initialize_layers()

If I change it to

if self._output_layer is None:
    self._output_layer = self.initialize_layers()

Then it works fine and I get the same validation error outside the train as I get inside - for the best validation loss.

Please try it out on your side and confirm The points 1) from my previous to previous post is also another reason for the difference

dnouri commented 9 years ago

@run2 Could you maybe help with reproducing this issue? I've added a test, but I'm not able to make it fail: a0769e072233424a957462103ec5e73815468545

run2 commented 9 years ago

Daniel you need two things to reproduce this issue

1) Have a batch size which does not divide into you validation set size

2) Train a net (with Early Stopping) such that it improves for a while (storing the weights at every improvement) and then the validation loss does not improve for n epochs where n is your patience, and the net exists by loading the stored weights. Make sure - the last epoch is NOT an improvement epoch. Then assert the validation loss

Let me know if you cannot reproduce - then I will have to create some dummy data - which will take some time

dnouri commented 9 years ago

I think the bug in load_weights_from that you describe in this comment might have been fixed since your report.

If I understand your report right, this means that only point 1) remains. If that's so, would you kindly distill a description of bug 1) and put it into a separate issue and then we'll have a look.

dnouri commented 8 years ago

Closing due to lack of feedback.