Closed Afef00 closed 2 years ago
Hi there,
You should be using categorical_crossentropy
instead of sparse if your labels are one-hot encoded, this should be throwing an error. The kernel might be dying because you're running out of memory trying to process a massive batch resulting from the steps_per_epoch
parameter in your fit
function set to 1 -- this results in your batch size being equal to your entire training set. I'd change it to 60000//batch_size, where batch_size=32 or some other smaller value.
Thanks Shawn
hello Shawn,
Thank you for the prompt reply.
Actually the problem persists even with small amount of dataset
xtrain=x_train[0:5000] ytrain=y_train[0:5000] batch_size = 32 func_model.fit(xtrain, ytrain, batch_size= batch_size, epochs=5, steps_per_epoch = 5000//batch_size,verbose = 2)
And for the use of steps_per_epoch I used because when fitting the model I got the following message error
ValueError: When using data tensors as input to a model, you should specify the steps_per_epoch
argument.
Thanks
So the kernel keeps dying? Is there any output on your terminal where you launched the jupyter notebook? The code snippet you provided works for me on a fresh docker image (vitis-ai-cpu:1.4.916) with the vitis-ai-tensorflow2 conda environment sourced. I just changed the loss function and the steps_per_epoch parameter as mentioned earlier. You also don't need to install or import keras as that is built into tensorflow2 now.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
x_train = np.repeat(x_train, 3, axis=-1)
x_test = np.repeat(x_test, 3, axis=-1)
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
x_train = tf.image.resize(x_train, [32,32])
x_test = tf.image.resize(x_test, [32,32])
y_train = tf.keras.utils.to_categorical(y_train , num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test , num_classes=10)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
input = tf.keras.Input(shape=(32,32,3))
efnet = tf.keras.applications.ResNet50(weights='imagenet',
include_top = False,
input_tensor = input)
gap = tf.keras.layers.GlobalMaxPooling2D()(efnet.output)
output = tf.keras.layers.Dense(10, activation='softmax', use_bias=True)(gap)
func_model = tf.keras.Model(efnet.input, output)
func_model.compile(optimizer='adam',
loss="categorical_crossentropy",
metrics=['accuracy'])
func_model.fit(x_train, y_train, epochs=5, validation_data=(x_test,y_test),
steps_per_epoch = 60000//32)
If you're still having issues with training on the docker I'd recommend going to the Vitis AI issue tracker.
Thanks Shawn
Hello Shawn, Thank you for your help, it works ! Best regards Afef00
I have been trying to retrain ResNet50 for MNIST classification using the code below following the provided example Build Machine Learning Models for DPU However I got the following message The kernel appears to have died. It will restart automatically.
Any suggestions how to solve this problem please? Thanks in advance.