Always get the same mean_per_class_accuracy

wanghsinwei commented 5 years ago

Always got the same result "mean_per_class_accuracy_8: 0.1250". The metrics was set as below: metrics=[mean_per_class_accuracy(8), 'accuracy']

Python: 3.6.8 Keras: 2.2.4 TensorFlow: 1.13.2 extra-keras-metrics: 1.1.2

LucaCappelletti94 commented 5 years ago

Hi! Could you please include a minimal example to reproduce the issue?

Also a note: remember that this library is just an easy to use interface to the metrics implemented in tensorflow and it does not contain the actual implementation.

I'll check out if there is anything off with anything related to the function you've encountered issues with, as soon as you can help me reproduce the issue.

Thanks!

wanghsinwei commented 5 years ago

Hi @LucaCappelletti94, mean_per_class_accuracy_10 is always 0.1000 in the below example.

import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
from extra_keras_metrics import mean_per_class_accuracy

batch_size = 32
num_classes = 10
epochs = 100

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=[mean_per_class_accuracy(num_classes), 'accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)

# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    epochs=epochs,
                    validation_data=(x_test, y_test),
                    workers=2)

LucaCappelletti94 commented 5 years ago

Hi! I was only able to run the model.fit, as the datagen variable is not defined in the code snippet you have provided. I have further simplified the code, for example I used as model just:

model = Sequential([
    Conv2D(32, (3, 3), activation="relu", padding='same', input_shape=x_train.shape[1:]),
    Flatten(),
    Dense(num_classes, activation="relu")
])
model.compile(
    loss='categorical_crossentropy',
    optimizer="nadam",
    metrics=[mean_per_class_accuracy(num_classes), 'accuracy']
)

And I've trained it for only 10 epochs with a batchsize of 1000 for complete it quickly.

Within this test, the metrics have not remained constant:

epochs	val_loss	val_mean_per_class_accuracy_10	val_acc	loss	mean_per_class_accuracy_10	acc
0	3.4297206163406373	0.016892356611788274	0.28660000264644625	3.6904227542877197	0.021318036876618863	0.19620000049471856
1	3.3964727401733397	0.015113359596580268	0.2548000022768974	3.5051417255401613	0.015769251938909292	0.15367999985814096
2	3.411152648925781	0.015844555757939815	0.15630000084638596	3.448981132507324	0.015492016263306142	0.2065800003707409
3	3.30597505569458	0.017691679298877716	0.3000000029802322	3.3679157304763794	0.01661189716309309	0.263259998857975
4	3.316573429107666	0.02034323364496231	0.3486000031232834	3.3215300941467287	0.01912804111838341	0.2894400006532669
5	3.6106775760650636	0.020836354978382588	0.11240000054240226	3.4575716257095337	0.021215040907263755	0.23116000026464462
6	3.513409209251404	0.019271432422101498	0.19979999959468842	3.5511790561676024	0.020006597079336642	0.14944000005722047
7	3.425991916656494	0.018136677145957947	0.28460000157356263	3.478358426094055	0.018661453574895858	0.24908000051975251
8	3.3601335525512694	0.017498228512704373	0.3123999983072281	3.4357294845581055	0.017808210849761964	0.254539999216795
9	3.2930378913879395	0.01763003468513489	0.3372000068426132	3.3702560329437254	0.017501365207135678	0.26709999933838846

If you have encountered the issue within the fit_generator it might be related to how Keras handles its metric variables in the generator context, but until you provide a runnable example (and please, make it minimal, I have a very old laptop to run tests on) I cannot help you further.

Have a nice day!

wanghsinwei commented 5 years ago

Hi Luca, Thanks for the quick reply! Sorry I forgot to remove datagen-related codes when simplifying the original Keras example. I found the issue can be reproduced if changing the last activation layer from 'relu' to 'softmax'. Here is a complete example. Highly recommend you use free GPUs on Google Colab for a quick test.

import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
from extra_keras_metrics import mean_per_class_accuracy

batch_size = 1000
num_classes = 10
epochs = 10

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential([
    Conv2D(32, (3, 3), activation="relu", padding='same', input_shape=x_train.shape[1:]),
    Flatten(),
    Dense(num_classes, activation="softmax"),
])

model.compile(
    loss='categorical_crossentropy',
    optimizer="nadam",
    metrics=[mean_per_class_accuracy(num_classes), 'accuracy']
)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)

Output:

epoch	val_loss	val_mean_per_class_accuracy_10	val_acc	loss	mean_per_class_accuracy_10	acc
0	1.5577	0.1000	0.4666	1.9247	0.1000	0.3420
1	1.4207	0.1000	0.5116	1.5305	0.1000	0.4698
2	1.4017	0.1000	0.5061	1.3903	0.1000	0.5225
3	1.3286	0.1000	0.5314	1.3055	0.1000	0.5491
4	1.3003	0.1000	0.5515	1.2478	0.1000	0.5704
5	1.2166	0.1000	0.5777	1.1852	0.1000	0.5930
6	1.2721	0.1000	0.5533	1.1429	0.1000	0.6082
7	1.2648	0.1000	0.5545	1.1185	0.1000	0.6165
8	1.2121	0.1000	0.5781	1.0829	0.1000	0.6303
9	1.2227	0.1000	0.5728	1.0398	0.1000	0.6421

Run this to install extra_keras_metrics on Colab.

!pip3 install extra_keras_metrics

LucaCappelletti94 commented 5 years ago

I've tried running it changing the output activations to relu, selu, sigmoid and softmax and for the first two ones it varies, but for the other two it remains constant.

In particular, I see that when it remains constant the value is always 1/n, where n is the parameter of the number classes.

I'm further reading the tensorflow documentation and I believe it has something to do with how the update operations change the Keras variables used to memorize the metrics.

I'm experimenting to see if tensorflow fails to change the variable name when it is masked.

wanghsinwei commented 5 years ago

I saw some people mentioned tf.local_variables_initializer() and tf.control_dependencies on stack overflow. Not sure if this is a correct solution.

LucaCappelletti94 commented 5 years ago

What was missing was tf.control_dependencies, I'm just fixing that and compactifying the code a bit more.

with tf.control_dependencies([up_opt]):
       score = tf.identity(score)

I'll let you know when the new version is out! Thank you very much for helping me out fixing the package :)

LucaCappelletti94 commented 5 years ago

Just published the new fixed version!

wanghsinwei commented 5 years ago

Thanks for the notification!

LucaCappelletti94 commented 5 years ago

If everything is back working fine and dandy we can close the issue, do let me know! Also, I've linked the library from the stackoverflow tread you have linked me.

wanghsinwei commented 5 years ago

Just reran the example above with extra-keras-metrics 1.2.0 on Colab but mean_per_class_accuracy was still incorrect. The correct value should be same as accuracy because samples are distributed equally. I tried the function balanced_accuracy below but the results were still not correct.

epoch	loss	mean_per_class_accuracy10	acc	balanced_accuracy	loss	mean_per_class_accuracy10	acc	balanced_accuracy
0	1.9443	0.0972	0.3433	0.2551	1.5698	0.1000	0.4683	0.3536
1	1.5296	0.0999	0.4755	0.3915	1.4880	0.0998	0.4697	0.4167
2	1.3769	0.1000	0.5249	0.4351	1.3306	0.1000	0.5462	0.4529
3	1.2796	0.1000	0.5610	0.4672	1.3362	0.1000	0.5429	0.4799
4	1.2287	0.1000	0.5784	0.4896	1.3016	0.1000	0.5462	0.4988
5	1.1789	0.1000	0.5976	0.5067	1.2745	0.1000	0.5597	0.5143
6	1.1421	0.1000	0.6064	0.5206	1.3431	0.1000	0.5322	0.5259
7	1.1076	0.1000	0.6184	0.5311	1.2406	0.1000	0.5621	0.5362
8	1.0656	0.1000	0.6366	0.5412	1.2677	0.1000	0.5513	0.5461
9	1.0525	0.1000	0.6399	0.5500	1.1809	0.1000	0.5874	0.5543

import keras.backend as K
import tensorflow as tf

def balanced_accuracy(y_true, y_pred):
    y_true_argmax = K.argmax(y_true, axis=1)
    y_pred_argmax = K.argmax(y_pred, axis=1)
    mean_accuracy, update_op = tf.metrics.mean_per_class_accuracy(y_true_argmax, y_pred_argmax, 10)
    K.get_session().run(tf.local_variables_initializer())
    with tf.control_dependencies([update_op]):
       mean_accuracy = tf.identity(mean_accuracy)

    return mean_accuracy

You could use balanced_accuracy_score or recall_score to verify the results.

from sklearn.metrics import balanced_accuracy_score, recall_score
import numpy as np

y_predict = model.predict(x_test) # trained model

print(balanced_accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_predict, axis=1)))
print(recall_score(np.argmax(y_test, axis=1), np.argmax(y_predict, axis=1), average='macro'))

Outputs: 0.5873999999999999 0.5873999999999999

LucaCappelletti94 commented 5 years ago

This seems like a really ugly one to debug, I'm starting by finding a way to get the value of a Tensor without having to train a model. Have you found a way to do that? So that I'll be able to compare the functions from tensorflow to ones in sklearn.

wanghsinwei commented 5 years ago

Do you mean something like this? By the way, just saw a great post.

from sklearn.metrics import balanced_accuracy_score
import tensorflow as tf

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

print(balanced_accuracy_score(y_true, y_pred))

mean_accuracy, update_op = tf.metrics.mean_per_class_accuracy(tf.convert_to_tensor(y_true),
                                                              tf.convert_to_tensor(y_pred),
                                                              3)
with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    sess.run(update_op)
    print(sess.run(mean_accuracy))

Outputs: 0.3333333333333333 0.33333334

LucaCappelletti94 commented 5 years ago

Perfect, I'm prepping batch testing for the various metrics.

LucaCappelletti94 commented 5 years ago

So, I've managed to get a damned negative value for accuracy from the TensorFlow original function, I'll manage to fix it but this is demonic.

wanghsinwei commented 5 years ago

I implemented a new function base on 1 and 2. The result is very close to sklearn.metrics.balanced_accuracy_score especially when batch_size is large.

def balanced_accuracy(num_classes):
    def fn(y_true, y_pred):
        class_id_true = K.argmax(y_true, axis=-1)
        class_id_pred = K.argmax(y_pred, axis=-1)
        class_acc_total = 0
        seen_classes = 0

        for c in range(num_classes):
            accuracy_mask = K.cast(K.equal(class_id_true, c), 'int32')
            class_acc_tensor = K.cast(K.equal(class_id_true, class_id_pred), 'int32') * accuracy_mask
            accuracy_mask_sum = K.sum(accuracy_mask)
            class_acc = K.cast(K.sum(class_acc_tensor) / K.maximum(accuracy_mask_sum, 1), K.floatx())
            class_acc_total += class_acc

            condition = K.equal(accuracy_mask_sum, 0)
            seen_classes = K.switch(condition, seen_classes, seen_classes+1)

        return class_acc_total / K.cast(seen_classes, K.floatx())
    fn.__name__ = "balanced_accuracy_{}".format(num_classes)
    return fn

model.compile(
    loss='categorical_crossentropy',
    optimizer="nadam",
    metrics=[balanced_accuracy(num_classes)]
)

LucaCappelletti94 commented 5 years ago

Ok, some good news! I've completed the testing of all non-parametric metrics and they all match the sklearn metrics within a reasonable Pearson coefficient (>0.99).

An additional question though: I see that the value of mean_per_class_accuracy mean_per_class_accuracy as out of TensorFlow isn't a single value metric, but returns a vector with the mean for each class. It isn't feasible therefore to be implemented as-is in Keras.

LucaCappelletti94 commented 5 years ago

I have completed the refactoring, removing all metrics that are not usable "as-is" in Keras and require to build customization around, since they return a vector, such as mean_per_class_accuracy.

LucaCappelletti94 / extra_keras_metrics

Always get the same mean_per_class_accuracy #2