Open kalpa61 opened 2 months ago
I am having same issues. I am conducting a 5 fold CV on fer+ and the results vary drastically for the exact same hyperparameters and training set up. The only thing that was fully randomized was the seed for shuffling the training dataset. So I experimented with different seed values for train set shuffling and I was getting different results on the same (first) fold. Then, I run all the 5 fold CV with constant seed = 123 and again results vary between the folds. For 1st, 2nd and 5th i got around 97%, 95% and 95% (very nice) but for 3rd and 4th i got only 30-40% accuracy. If i set the seed to 42 i got on 1st fold only 53% accuracy. + I noticed that the test accuracy is either much higher or way lower than training accuracy. Sometimes it differed a lot between validation accuracy as well. I will be discusing this issue with my supervisor, because I have no idea why is this happening.
Yeah, I further experimented with parameters and seeds for shuffling the dataset, augmentation, dropout, and I keep getting mixed results even for same folds with same parameters - looks like introducing the attention layer (using both custom function and one from maximal library) causes the model to be unstable, which means sometimes i can get 95% and model learns pretty well, but then I rerun the training and for som reason now it's stack at 23% or 30% or even 1% and it doesn't learn at all. Looks like the model is also very sensitive to different lr or batch sizes. No idea how to fix that
Yeah, I further experimented with parameters and seeds for shuffling the dataset, augmentation, dropout, and I keep getting mixed results even for same folds with same parameters - looks like introducing the attention layer (using both custom function and one from maximal library) causes the model to be unstable, which means sometimes i can get 95% and model learns pretty well, but then I rerun the training and for som reason now it's stack at 23% or 30% or even 1% and it doesn't learn at all. Looks like the model is also very sensitive to different lr or batch sizes. No idea how to fix that
@KrystianZielinski, now, Do you now know the reason for the fluctuating accuracy?
I trained on ferplus for four times, twice acc was 93% and twice ACC was 89%, what is the reason for this, when I trained on rafdb, the result was very stable at 95% my train code:
import h5py import numpy as np import tensorflow as tf from tensorflow import keras from sklearn.utils import shuffle from sklearn.utils.class_weight import compute_class_weight
Model Building
NUM_CLASSES = 8 IMG_SHAPE = (120, 120, 3) BATCH_SIZE = 8
with h5py.File(r'.\ferplus.h5', 'r') as hdf5_file:
Load your data here, PAtt-Lite was trained with h5py for shorter loading time
X_train, y_train = shuffle(X_train, y_train)
print("Shape of train_sample: {}".format(X_train.shape)) print("Shape of train_label: {}".format(y_train.shape)) print("Shape of valid_sample: {}".format(X_valid.shape)) print("Shape of valid_label: {}".format(y_valid.shape)) print("Shape of test_sample: {}".format(X_test.shape)) print("Shape of test_label: {}".format(y_test.shape))
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train) class_weights = dict(enumerate(class_weights))
Model Building
input_layer = tf.keras.Input(shape=IMG_SHAPE, name='universal_input') sample_resizing = tf.keras.layers.experimental.preprocessing.Resizing(224, 224, name="resize") data_augmentation = tf.keras.Sequential([tf.keras.layers.RandomFlip(mode='horizontal'), tf.keras.layers.RandomContrast(factor=0.3)], name="augmentation") preprocess_input = tf.keras.applications.mobilenet.preprocess_input
backbone = tf.keras.applications.mobilenet.MobileNet(input_shape=(224, 224, 3), include_top=False, weights='imagenet') backbone.trainable = False base_model = tf.keras.Model(backbone.input, backbone.layers[-29].output, name='base_model')
self_attention = tf.keras.layers.Attention(use_scale=True, name='attention') patch_extraction = tf.keras.Sequential([ tf.keras.layers.SeparableConv2D(256, kernel_size=4, strides=4, padding='same', activation='relu'), tf.keras.layers.SeparableConv2D(256, kernel_size=2, strides=2, padding='valid', activation='relu'), tf.keras.layers.Conv2D(256, kernel_size=1, strides=1, padding='valid', activation='relu') ], name='patch_extraction') global_average_layer = tf.keras.layers.GlobalAveragePooling2D(name='gap') pre_classification = tf.keras.Sequential([tf.keras.layers.Dense(32, activation='relu'), tf.keras.layers.BatchNormalization()], name='pre_classification') prediction_layer = tf.keras.layers.Dense(NUM_CLASSES, activation="softmax", name='classification_head')
inputs = input_layer x = sample_resizing(inputs) x = data_augmentation(x) x = preprocess_input(x) x = base_model(x, training=False) x = patch_extraction(x) x = global_average_layer(x) x = tf.keras.layers.Dropout(TRAIN_DROPOUT)(x) x = pre_classification(x) x = self_attention([x, x]) outputs = prediction_layer(x) model = tf.keras.Model(inputs, outputs, name='train-head') model.compile(optimizer=keras.optimizers.Adam(learning_rate=TRAIN_LR, global_clipnorm=3.0), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Training Procedure
early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=TRAIN_ES_PATIENCE, min_delta=ES_LR_MIN_DELTA, restore_best_weights=True) learning_rate_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', patience=TRAIN_LR_PATIENCE, verbose=0, min_delta=ES_LR_MIN_DELTA, min_lr=TRAIN_MIN_LR) history =, y_train, epochs=TRAIN_EPOCH, batch_size=BATCH_SIZE, validation_data=(X_valid, y_valid), verbose=1, class_weight=class_weights, callbacks=[early_stopping_callback, learning_rate_callback]) test_loss, test_acc = model.evaluate(X_test, y_test)
Model Finetuning
print("\nFinetuning ...") unfreeze = 59 base_model.trainable = True fine_tune_from = len(base_model.layers) - unfreeze for layer in base_model.layers[:fine_tune_from]: layer.trainable = False for layer in base_model.layers[fine_tune_from:]: if isinstance(layer, tf.keras.layers.BatchNormalization): layer.trainable = False
inputs = input_layer x = sample_resizing(inputs) x = data_augmentation(x) x = preprocess_input(x) x = base_model(x, training=False) x = patch_extraction(x) x = tf.keras.layers.SpatialDropout2D(FT_DROPOUT)(x) x = global_average_layer(x) x = tf.keras.layers.Dropout(FT_DROPOUT)(x) x = pre_classification(x) x = self_attention([x, x]) x = tf.keras.layers.Dropout(FT_DROPOUT)(x) outputs = prediction_layer(x) model = tf.keras.Model(inputs, outputs, name='finetune-backbone') model.compile(optimizer=keras.optimizers.Adam(learning_rate=FT_LR, global_clipnorm=3.0), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Training Procedure
log_dir = "logs/fit/" +"%Y%m%d-%H%M%S") tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='accuracy', min_delta=ES_LR_MIN_DELTA, patience=FT_ES_PATIENCE, restore_best_weights=True) scheduler = keras.optimizers.schedules.InverseTimeDecay(initial_learning_rate=FT_LR, decay_steps=FT_LR_DECAY_STEP, decay_rate=FT_LR_DECAY_RATE) scheduler_callback = tf.keras.callbacks.LearningRateScheduler(schedule=scheduler)
history_finetune =, y_train, epochs=FT_EPOCH, batch_size=BATCH_SIZE, validation_data=(X_valid, y_valid), verbose=1, initial_epoch=history.epoch[-TRAIN_ES_PATIENCE], callbacks=[early_stopping_callback, scheduler_callback, tensorboard_callback]) test_loss, test_acc = model.evaluate(X_test, y_test)'model.h5')