Closed annabelleYan closed 2 years ago
Well apparently the shape of train_labels
is (None, ..., None, 1)
. So it depends on what you want to do.
There are basically 2 ways to encode, for example, class 2. Either say 2
(sparse) or say [0, 0, 1, 0, ..., 0, 0]
(categorical or one-hot encoded).
According to the encoding you use, you should set correctly:
I believe you are mixing both applications here.
In particular:
ComplexAverageCrossEntropy
is made for categorical encoding (so the last dimension should be 10) but your train labels have shape 1 at the end and not 10. Probably you can use tf.keras.utils.to_categorical
on the labels to solve the issue.
Thanks for your reply. The issue was solved after I used _tf.keras.utils.tocategorical on the labels. However the network seems still not training correctly since the complex accuracy is always 0 whatever my epoch value is. Could you run this modified code again to see what's going on here? Also for this particular example, what activation function for the last layer and loss function are expected to use? In my view, the 'crelu' and 'ComplexAverageCrossEntropy' should be fine, but i'm not sure whether it is correct or not. Many thanks for your time.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import cvnn.layers as complex_layers
import numpy as np
from cvnn import losses
from cvnn import metrics
from pdb import set_trace
from importlib import reload
import os
import tensorflow
from matplotlib import pyplot as plt
from tensorflow.keras.utils import to_categorical
def own_complex_fit(epochs=10):
tf.random.set_seed(1)
init = 'ComplexGlorotUniform'
acti = 'crelu'
init_tech = 'mirror'
model = models.Sequential()
model.add(complex_layers.ComplexConv2D(32, (3, 3), activation=acti, input_shape=(32, 32, 3),
kernel_initializer=init, use_bias=False, init_technique=init_tech))
model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
use_bias=False, init_technique=init_tech))
model.add(complex_layers.ComplexMaxPooling2D((2, 2)))
model.add(complex_layers.ComplexConv2D(64, (3, 3), activation=acti, kernel_initializer=init,
use_bias=False, init_technique=init_tech))
model.add(complex_layers.ComplexFlatten())
model.add(complex_layers.ComplexDense(64, activation=acti, kernel_initializer=init,
use_bias=False, init_technique=init_tech))
# model.add(complex_layers.ComplexDense(10, activation='cast_to_real', kernel_initializer=init,
# use_bias=False, init_technique='zero_imag'))
model.add(complex_layers.ComplexDense(10, activation='cast_to_real', kernel_initializer=init,
use_bias=False, init_technique=init_tech))
print(model.summary())
model.compile(optimizer='sgd',
loss=losses.ComplexAverageCrossEntropy(),
metrics=metrics.ComplexAccuracy())
weigths = model.get_weights()
with tf.GradientTape() as tape:
loss = model.compiled_loss(y_true=tf.convert_to_tensor(test_labels), y_pred=model(test_images))
gradients = tape.gradient(loss, model.trainable_weights) # back-propagation
history = model.fit(train_images, train_labels, epochs=epochs, validation_data=(test_images, test_labels),batch_size=32)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
logs = {
'weights_at_init': weigths,
'loss': loss,
'gradients': gradients,
'weights_at_end': model.get_weights()
}
return history, logs
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images.astype(dtype=np.complex64) / 255.0, test_images.astype(dtype=np.complex64) / 255.0
train_labels = to_categorical(train_labels,10)
test_labels = to_categorical(test_labels,10)
reload(tensorflow)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
own, own_logs = own_complex_fit(epochs=5)
history = own
print(history.history.keys())
# "Accuracy"
plt.plot(history.history['complex_accuracy'])
plt.plot(history.history['val_complex_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
Unfortunately, I don't have access to a computer today so I won't be able to test the code. Crelu and your loss function should be correct.
I can see some strange things, however, your main issue with accuracy is probable because you are not using bias. Also, I don't think using a constant seed makes much sense unless you specifically want to set it to recreate results.
The example code was just meant for comparison against tensorflow and has therefore many variables that should normally not be used.
Hi, I found that the model works fine if I use loss function from _tf.keras.losses.SparseCategoricalCrossentropy(fromlogits=True), but it will not train correctly once losses.ComplexAverageCrossEntropy is applied (0 complex accuracy). Could you give me a workable example on how to use ComplexAverageCrossEntropy? Sorry for continuously bothering you. Many thanks for your help.
So I tested the code this time and uploaded a working example here. There are several problems I found in your code (to add to the ones I found previously like removing the use_bias=False
):
model.add(complex_layers.ComplexInput(input_shape=(32, 32, 3)))
. When you run the code you should have seen the warning message that this should be done. Tensorflow by default casts the input to real if you don't do this.Accuracy
and CategoricalAccuracy
). Most time we don't notice because using the parameter as a string like "accuracy"
selects the correct one automatically (not for my library tho).This also happens with the loss. Tensorflow has SparseCategoricalCrossentropy
and CategoricalCrossentropy
and the difference must be understood.
Again, all the bugs in your code were a combination of mixing both labels representations.
If you want working examples on the complex domain, I believe the best is to go directly to the main code here and just copy-paste the readme code. The example you used is using my library CVNN to implement a real-valued convolutional neural network (RV-CNN) which you will normally just do with tensorflow and not my library. It's just a code I used to make sure my library had no bugs (if I achieved the same results exactly as TF then it should be Ok).
One other comment on how you compute your loss. You have at least two options when computing the error/loss on a complex network:
You are encouraged to use method 2 if you prefer.
I really appreciate your patience and detailed codes. Thanks again for your kind response. Best wishes.
My pleasure
Hi, I got a new question related to this work. I'm trying to use complex inputs by changing dtype=np.complex64: _train_images, test_images = train_images.astype(dtype=np.complex64) / 255.0, test_images.astype(dtype=np.complex64) / 255.0 And I also modify activation = 'crelu', kernel_initializer='ComplexGlorotUniform', init_technique='mirror', as well as model.compile(optimizer='sgd', loss=losses.ComplexAverageCrossEntropy(), metrics=metrics.ComplexAccuracy())_. But it encountor this error:
The complete modified codes are copied as follows:
If I change the code
model.add(complex_layers.ComplexDense(10, activation=acti, kernel_initializer=init, use_bias=False, init_technique=init_tech))
tomodel.add(complex_layers.ComplexDense(1, activation=acti, kernel_initializer=init, use_bias=False, init_technique=init_tech))
, the error disappears but the train and test accuracy will continue to 0 without learning. Actually it does not make any sence to simply change 10 to 1 since the number of classes for this example is 10. I'm wondering what should I do to debug this issue properly. Many thanks for your help.Originally posted by @annabelleYan in https://github.com/NEGU93/cvnn/issues/16#issuecomment-987927633