Closed wanghsinwei closed 5 years ago
Hi! Could you please include a minimal example to reproduce the issue?
Also a note: remember that this library is just an easy to use interface to the metrics implemented in tensorflow and it does not contain the actual implementation.
I'll check out if there is anything off with anything related to the function you've encountered issues with, as soon as you can help me reproduce the issue.
Thanks!
Hi @LucaCappelletti94, mean_per_class_accuracy_10 is always 0.1000 in the below example.
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
from extra_keras_metrics import mean_per_class_accuracy
batch_size = 32
num_classes = 10
epochs = 100
# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=[mean_per_class_accuracy(num_classes), 'accuracy'])
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
epochs=epochs,
validation_data=(x_test, y_test),
workers=2)
Hi! I was only able to run the model.fit, as the datagen variable is not defined in the code snippet you have provided. I have further simplified the code, for example I used as model just:
model = Sequential([
Conv2D(32, (3, 3), activation="relu", padding='same', input_shape=x_train.shape[1:]),
Flatten(),
Dense(num_classes, activation="relu")
])
model.compile(
loss='categorical_crossentropy',
optimizer="nadam",
metrics=[mean_per_class_accuracy(num_classes), 'accuracy']
)
And I've trained it for only 10
epochs with a batchsize of 1000
for complete it quickly.
Within this test, the metrics have not remained constant:
epochs | val_loss | val_mean_per_class_accuracy_10 | val_acc | loss | mean_per_class_accuracy_10 | acc |
---|---|---|---|---|---|---|
0 | 3.4297206163406373 | 0.016892356611788274 | 0.28660000264644625 | 3.6904227542877197 | 0.021318036876618863 | 0.19620000049471856 |
1 | 3.3964727401733397 | 0.015113359596580268 | 0.2548000022768974 | 3.5051417255401613 | 0.015769251938909292 | 0.15367999985814096 |
2 | 3.411152648925781 | 0.015844555757939815 | 0.15630000084638596 | 3.448981132507324 | 0.015492016263306142 | 0.2065800003707409 |
3 | 3.30597505569458 | 0.017691679298877716 | 0.3000000029802322 | 3.3679157304763794 | 0.01661189716309309 | 0.263259998857975 |
4 | 3.316573429107666 | 0.02034323364496231 | 0.3486000031232834 | 3.3215300941467287 | 0.01912804111838341 | 0.2894400006532669 |
5 | 3.6106775760650636 | 0.020836354978382588 | 0.11240000054240226 | 3.4575716257095337 | 0.021215040907263755 | 0.23116000026464462 |
6 | 3.513409209251404 | 0.019271432422101498 | 0.19979999959468842 | 3.5511790561676024 | 0.020006597079336642 | 0.14944000005722047 |
7 | 3.425991916656494 | 0.018136677145957947 | 0.28460000157356263 | 3.478358426094055 | 0.018661453574895858 | 0.24908000051975251 |
8 | 3.3601335525512694 | 0.017498228512704373 | 0.3123999983072281 | 3.4357294845581055 | 0.017808210849761964 | 0.254539999216795 |
9 | 3.2930378913879395 | 0.01763003468513489 | 0.3372000068426132 | 3.3702560329437254 | 0.017501365207135678 | 0.26709999933838846 |
If you have encountered the issue within the fit_generator it might be related to how Keras handles its metric variables in the generator context, but until you provide a runnable example (and please, make it minimal, I have a very old laptop to run tests on) I cannot help you further.
Have a nice day!
Hi Luca, Thanks for the quick reply! Sorry I forgot to remove datagen-related codes when simplifying the original Keras example. I found the issue can be reproduced if changing the last activation layer from 'relu' to 'softmax'. Here is a complete example. Highly recommend you use free GPUs on Google Colab for a quick test.
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
from extra_keras_metrics import mean_per_class_accuracy
batch_size = 1000
num_classes = 10
epochs = 10
# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential([
Conv2D(32, (3, 3), activation="relu", padding='same', input_shape=x_train.shape[1:]),
Flatten(),
Dense(num_classes, activation="softmax"),
])
model.compile(
loss='categorical_crossentropy',
optimizer="nadam",
metrics=[mean_per_class_accuracy(num_classes), 'accuracy']
)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
Output:
epoch | val_loss | val_mean_per_class_accuracy_10 | val_acc | loss | mean_per_class_accuracy_10 | acc |
---|---|---|---|---|---|---|
0 | 1.5577 | 0.1000 | 0.4666 | 1.9247 | 0.1000 | 0.3420 |
1 | 1.4207 | 0.1000 | 0.5116 | 1.5305 | 0.1000 | 0.4698 |
2 | 1.4017 | 0.1000 | 0.5061 | 1.3903 | 0.1000 | 0.5225 |
3 | 1.3286 | 0.1000 | 0.5314 | 1.3055 | 0.1000 | 0.5491 |
4 | 1.3003 | 0.1000 | 0.5515 | 1.2478 | 0.1000 | 0.5704 |
5 | 1.2166 | 0.1000 | 0.5777 | 1.1852 | 0.1000 | 0.5930 |
6 | 1.2721 | 0.1000 | 0.5533 | 1.1429 | 0.1000 | 0.6082 |
7 | 1.2648 | 0.1000 | 0.5545 | 1.1185 | 0.1000 | 0.6165 |
8 | 1.2121 | 0.1000 | 0.5781 | 1.0829 | 0.1000 | 0.6303 |
9 | 1.2227 | 0.1000 | 0.5728 | 1.0398 | 0.1000 | 0.6421 |
Run this to install extra_keras_metrics on Colab.
!pip3 install extra_keras_metrics
I've tried running it changing the output activations to relu
, selu
, sigmoid
and softmax
and for the first two ones it varies, but for the other two it remains constant.
In particular, I see that when it remains constant the value is always 1/n
, where n
is the parameter of the number classes.
I'm further reading the tensorflow documentation and I believe it has something to do with how the update operations change the Keras variables used to memorize the metrics.
I'm experimenting to see if tensorflow fails to change the variable name when it is masked.
I saw some people mentioned tf.local_variables_initializer()
and tf.control_dependencies
on stack overflow. Not sure if this is a correct solution.
What was missing was tf.control_dependencies
, I'm just fixing that and compactifying the code a bit more.
with tf.control_dependencies([up_opt]):
score = tf.identity(score)
I'll let you know when the new version is out! Thank you very much for helping me out fixing the package :)
Just published the new fixed version!
Thanks for the notification!
If everything is back working fine and dandy we can close the issue, do let me know! Also, I've linked the library from the stackoverflow tread you have linked me.
Just reran the example above with extra-keras-metrics 1.2.0
on Colab but mean_per_class_accuracy was still incorrect. The correct value should be same as accuracy
because samples are distributed equally. I tried the function balanced_accuracy
below but the results were still not correct.
epoch | loss | mean_per_class_accuracy10 | acc | balanced_accuracy | loss | mean_per_class_accuracy10 | acc | balanced_accuracy |
---|---|---|---|---|---|---|---|---|
0 | 1.9443 | 0.0972 | 0.3433 | 0.2551 | 1.5698 | 0.1000 | 0.4683 | 0.3536 |
1 | 1.5296 | 0.0999 | 0.4755 | 0.3915 | 1.4880 | 0.0998 | 0.4697 | 0.4167 |
2 | 1.3769 | 0.1000 | 0.5249 | 0.4351 | 1.3306 | 0.1000 | 0.5462 | 0.4529 |
3 | 1.2796 | 0.1000 | 0.5610 | 0.4672 | 1.3362 | 0.1000 | 0.5429 | 0.4799 |
4 | 1.2287 | 0.1000 | 0.5784 | 0.4896 | 1.3016 | 0.1000 | 0.5462 | 0.4988 |
5 | 1.1789 | 0.1000 | 0.5976 | 0.5067 | 1.2745 | 0.1000 | 0.5597 | 0.5143 |
6 | 1.1421 | 0.1000 | 0.6064 | 0.5206 | 1.3431 | 0.1000 | 0.5322 | 0.5259 |
7 | 1.1076 | 0.1000 | 0.6184 | 0.5311 | 1.2406 | 0.1000 | 0.5621 | 0.5362 |
8 | 1.0656 | 0.1000 | 0.6366 | 0.5412 | 1.2677 | 0.1000 | 0.5513 | 0.5461 |
9 | 1.0525 | 0.1000 | 0.6399 | 0.5500 | 1.1809 | 0.1000 | 0.5874 | 0.5543 |
import keras.backend as K
import tensorflow as tf
def balanced_accuracy(y_true, y_pred):
y_true_argmax = K.argmax(y_true, axis=1)
y_pred_argmax = K.argmax(y_pred, axis=1)
mean_accuracy, update_op = tf.metrics.mean_per_class_accuracy(y_true_argmax, y_pred_argmax, 10)
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([update_op]):
mean_accuracy = tf.identity(mean_accuracy)
return mean_accuracy
You could use balanced_accuracy_score
or recall_score
to verify the results.
from sklearn.metrics import balanced_accuracy_score, recall_score
import numpy as np
y_predict = model.predict(x_test) # trained model
print(balanced_accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_predict, axis=1)))
print(recall_score(np.argmax(y_test, axis=1), np.argmax(y_predict, axis=1), average='macro'))
Outputs: 0.5873999999999999 0.5873999999999999
This seems like a really ugly one to debug, I'm starting by finding a way to get the value of a Tensor without having to train a model. Have you found a way to do that? So that I'll be able to compare the functions from tensorflow to ones in sklearn.
Do you mean something like this? By the way, just saw a great post.
from sklearn.metrics import balanced_accuracy_score
import tensorflow as tf
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(balanced_accuracy_score(y_true, y_pred))
mean_accuracy, update_op = tf.metrics.mean_per_class_accuracy(tf.convert_to_tensor(y_true),
tf.convert_to_tensor(y_pred),
3)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(update_op)
print(sess.run(mean_accuracy))
Outputs: 0.3333333333333333 0.33333334
Perfect, I'm prepping batch testing for the various metrics.
So, I've managed to get a damned negative value for accuracy from the TensorFlow original function, I'll manage to fix it but this is demonic.
I implemented a new function base on 1 and 2. The result is very close to sklearn.metrics.balanced_accuracy_score
especially when batch_size is large.
def balanced_accuracy(num_classes):
def fn(y_true, y_pred):
class_id_true = K.argmax(y_true, axis=-1)
class_id_pred = K.argmax(y_pred, axis=-1)
class_acc_total = 0
seen_classes = 0
for c in range(num_classes):
accuracy_mask = K.cast(K.equal(class_id_true, c), 'int32')
class_acc_tensor = K.cast(K.equal(class_id_true, class_id_pred), 'int32') * accuracy_mask
accuracy_mask_sum = K.sum(accuracy_mask)
class_acc = K.cast(K.sum(class_acc_tensor) / K.maximum(accuracy_mask_sum, 1), K.floatx())
class_acc_total += class_acc
condition = K.equal(accuracy_mask_sum, 0)
seen_classes = K.switch(condition, seen_classes, seen_classes+1)
return class_acc_total / K.cast(seen_classes, K.floatx())
fn.__name__ = "balanced_accuracy_{}".format(num_classes)
return fn
model.compile(
loss='categorical_crossentropy',
optimizer="nadam",
metrics=[balanced_accuracy(num_classes)]
)
Ok, some good news! I've completed the testing of all non-parametric metrics and they all match the sklearn metrics within a reasonable Pearson coefficient (>0.99).
An additional question though: I see that the value of mean_per_class_accuracy mean_per_class_accuracy
as out of TensorFlow isn't a single value metric, but returns a vector with the mean for each class. It isn't feasible therefore to be implemented as-is in Keras.
I have completed the refactoring, removing all metrics that are not usable "as-is" in Keras and require to build customization around, since they return a vector, such as mean_per_class_accuracy
.
Always got the same result "mean_per_class_accuracy_8: 0.1250". The metrics was set as below:
metrics=[mean_per_class_accuracy(8), 'accuracy']
Python: 3.6.8 Keras: 2.2.4 TensorFlow: 1.13.2 extra-keras-metrics: 1.1.2