NEGU93 / cvnn

Library to help implement a complex-valued neural network (cvnn) using tensorflow as back-end
https://complex-valued-neural-networks.readthedocs.io/
MIT License
160 stars 33 forks source link

cifar40_new issue #17

Closed annabelleYan closed 2 years ago

annabelleYan commented 2 years ago

Hi, I got a new question related to this work. I'm trying to use complex inputs by changing dtype=np.complex64: _train_images, test_images = train_images.astype(dtype=np.complex64) / 255.0, test_images.astype(dtype=np.complex64) / 255.0 And I also modify activation = 'crelu', kernel_initializer='ComplexGlorotUniform', init_technique='mirror', as well as model.compile(optimizer='sgd', loss=losses.ComplexAverageCrossEntropy(), metrics=metrics.ComplexAccuracy())_. But it encountor this error: image

The complete modified codes are copied as follows:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import cvnn.layers as complex_layers
import numpy as np
from cvnn import losses
from cvnn import metrics
from pdb import set_trace
from importlib import reload
import os
import tensorflow
from matplotlib import pyplot as plt

def own_complex_fit(epochs=10):
    tf.random.set_seed(1)
    init = 'ComplexGlorotUniform'
    acti = 'crelu'
    init_tech = 'mirror'
    model = models.Sequential()
    model.add(complex_layers.ComplexConv2D(32, (3, 3), activation=acti, input_shape=(32, 32, 3),
                                           kernel_initializer=init, use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
    model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
                                           use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
    model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
                                           use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexFlatten())
    model.add(complex_layers.ComplexDense(64, activation=acti, kernel_initializer=init,
                                          use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexDense(10, activation=acti, kernel_initializer=init,
                                          use_bias=False, init_technique=init_tech))
    print(model.summary())

    model.compile(optimizer='sgd', 
                  loss=losses.ComplexAverageCrossEntropy(),
                  metrics=metrics.ComplexAccuracy())

    weigths = model.get_weights()
    with tf.GradientTape() as tape:
        loss = model.compiled_loss(y_true=tf.convert_to_tensor(test_labels), y_pred=model(test_images))
        gradients = tape.gradient(loss, model.trainable_weights)  # back-propagation
    history = model.fit(train_images, train_labels, epochs=epochs, validation_data=(test_images, test_labels))
    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    logs = {
        'weights_at_init': weigths,
        'loss': loss,
        'gradients': gradients,
        'weights_at_end': model.get_weights()
    }
    return history, logs

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images.astype(dtype=np.complex64) / 255.0, test_images.astype(dtype=np.complex64) / 255.0

reload(tensorflow)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
own, own_logs = own_complex_fit(epochs=5)
history = own
print(history.history.keys())
#  "Accuracy"
plt.plot(history.history['complex_accuracy'])
plt.plot(history.history['val_complex_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

If I change the code model.add(complex_layers.ComplexDense(10, activation=acti, kernel_initializer=init, use_bias=False, init_technique=init_tech)) to model.add(complex_layers.ComplexDense(1, activation=acti, kernel_initializer=init, use_bias=False, init_technique=init_tech)), the error disappears but the train and test accuracy will continue to 0 without learning. Actually it does not make any sence to simply change 10 to 1 since the number of classes for this example is 10. I'm wondering what should I do to debug this issue properly. Many thanks for your help.

Originally posted by @annabelleYan in https://github.com/NEGU93/cvnn/issues/16#issuecomment-987927633

NEGU93 commented 2 years ago

Well apparently the shape of train_labels is (None, ..., None, 1). So it depends on what you want to do. There are basically 2 ways to encode, for example, class 2. Either say 2 (sparse) or say [0, 0, 1, 0, ..., 0, 0] (categorical or one-hot encoded). According to the encoding you use, you should set correctly:

I believe you are mixing both applications here.

In particular: ComplexAverageCrossEntropy is made for categorical encoding (so the last dimension should be 10) but your train labels have shape 1 at the end and not 10. Probably you can use tf.keras.utils.to_categorical on the labels to solve the issue.

annabelleYan commented 2 years ago

Thanks for your reply. The issue was solved after I used _tf.keras.utils.tocategorical on the labels. However the network seems still not training correctly since the complex accuracy is always 0 whatever my epoch value is. Could you run this modified code again to see what's going on here? Also for this particular example, what activation function for the last layer and loss function are expected to use? In my view, the 'crelu' and 'ComplexAverageCrossEntropy' should be fine, but i'm not sure whether it is correct or not. Many thanks for your time.

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import cvnn.layers as complex_layers
import numpy as np
from cvnn import losses
from cvnn import metrics
from pdb import set_trace
from importlib import reload
import os
import tensorflow
from matplotlib import pyplot as plt
from tensorflow.keras.utils import to_categorical
def own_complex_fit(epochs=10):
    tf.random.set_seed(1)
    init = 'ComplexGlorotUniform'
    acti = 'crelu'
    init_tech = 'mirror'
    model = models.Sequential()
    model.add(complex_layers.ComplexConv2D(32, (3, 3), activation=acti, input_shape=(32, 32, 3),
                                           kernel_initializer=init, use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
    model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
                                           use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
    model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
                                           use_bias=False, init_technique=init_tech))
    model.add(complex_layers.ComplexFlatten())
    model.add(complex_layers.ComplexDense(64, activation=acti, kernel_initializer=init,
                                          use_bias=False, init_technique=init_tech))
    # model.add(complex_layers.ComplexDense(10, activation='cast_to_real', kernel_initializer=init,
    #                                       use_bias=False, init_technique='zero_imag'))
    model.add(complex_layers.ComplexDense(10, activation='cast_to_real', kernel_initializer=init,
                                          use_bias=False, init_technique=init_tech))
    print(model.summary())

    model.compile(optimizer='sgd', 
                  loss=losses.ComplexAverageCrossEntropy(),
                  metrics=metrics.ComplexAccuracy())

    weigths = model.get_weights()
    with tf.GradientTape() as tape:
        loss = model.compiled_loss(y_true=tf.convert_to_tensor(test_labels), y_pred=model(test_images))
        gradients = tape.gradient(loss, model.trainable_weights)  # back-propagation
    history = model.fit(train_images, train_labels, epochs=epochs, validation_data=(test_images, test_labels),batch_size=32)
    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    logs = {
        'weights_at_init': weigths,
        'loss': loss,
        'gradients': gradients,
        'weights_at_end': model.get_weights()
    }
    return history, logs

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images.astype(dtype=np.complex64) / 255.0, test_images.astype(dtype=np.complex64) / 255.0
train_labels = to_categorical(train_labels,10)
test_labels = to_categorical(test_labels,10)

reload(tensorflow)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
own, own_logs = own_complex_fit(epochs=5)
history = own
print(history.history.keys())
#  "Accuracy"
plt.plot(history.history['complex_accuracy'])
plt.plot(history.history['val_complex_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
NEGU93 commented 2 years ago

Unfortunately, I don't have access to a computer today so I won't be able to test the code. Crelu and your loss function should be correct.

I can see some strange things, however, your main issue with accuracy is probable because you are not using bias. Also, I don't think using a constant seed makes much sense unless you specifically want to set it to recreate results.

The example code was just meant for comparison against tensorflow and has therefore many variables that should normally not be used.

annabelleYan commented 2 years ago

Hi, I found that the model works fine if I use loss function from _tf.keras.losses.SparseCategoricalCrossentropy(fromlogits=True), but it will not train correctly once losses.ComplexAverageCrossEntropy is applied (0 complex accuracy). Could you give me a workable example on how to use ComplexAverageCrossEntropy? Sorry for continuously bothering you. Many thanks for your help.

NEGU93 commented 2 years ago

So I tested the code this time and uploaded a working example here. There are several problems I found in your code (to add to the ones I found previously like removing the use_bias=False):

  1. You NEED to add the layer model.add(complex_layers.ComplexInput(input_shape=(32, 32, 3))). When you run the code you should have seen the warning message that this should be done. Tensorflow by default casts the input to real if you don't do this.
  2. If using categorical applications (and therefore categorical loss) you need to add an output activation function such as softmax (in this case, one that supports complex numbers).
  3. Then, there are two accuracies metrics. Either ComplexAccuracy or ComplexCategoricalAccuracy. You need to use in these cases the correct one. Tensorflow also has two versions (Accuracy and CategoricalAccuracy). Most time we don't notice because using the parameter as a string like "accuracy" selects the correct one automatically (not for my library tho).

This also happens with the loss. Tensorflow has SparseCategoricalCrossentropy and CategoricalCrossentropy and the difference must be understood.

Again, all the bugs in your code were a combination of mixing both labels representations.

If you want working examples on the complex domain, I believe the best is to go directly to the main code here and just copy-paste the readme code. The example you used is using my library CVNN to implement a real-valued convolutional neural network (RV-CNN) which you will normally just do with tensorflow and not my library. It's just a code I used to make sure my library had no bugs (if I achieved the same results exactly as TF then it should be Ok).

NEGU93 commented 2 years ago

One other comment on how you compute your loss. You have at least two options when computing the error/loss on a complex network:

  1. What you have done so far. Having a complex output and make the loss be real.
  2. Making the output be real and then using a Tensorflow loss. If you feel better using tensorflow loss you can use these technique and use an activation function named here to have a real output, as if it was a real network.

You are encouraged to use method 2 if you prefer.

annabelleYan commented 2 years ago

I really appreciate your patience and detailed codes. Thanks again for your kind response. Best wishes.

NEGU93 commented 2 years ago

My pleasure