Open konegen opened 2 years ago
Hi @konegen,
is it possible that you have not adjusted the output tensor and the target tensor? Just increasing the number of neurons in the output layer is not enough.
Here is an example with two outputs: https://github.com/Fraunhofer-IMS/AIfES_for_Arduino/blob/main/examples/0_Universal/0_XOR/2_XOR_training_2_outputs/2_XOR_training_2_outputs.ino
Check out our new tutorial, it explains the tensors in detail. A tutorial on training will also be coming soon. https://create.arduino.cc/projecthub/aifes_team/aifes-inference-tutorial-f44d96?ref=user&ref_id=1924948&offset=0
I guess you want to use a softmax in the output or? There is also an example here: https://github.com/Fraunhofer-IMS/AIfES_for_Arduino/blob/main/examples/1_Nano_BLE_Sense/0_Color_detection/creation_and_training.ino
For softmax please use the crossentropy as loss.
Here is an example how it could look like. If you have more than two inputs, you have to adjust the input tensor as well.
float target_data[4][10] = {
{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f ,1.0f},
{0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f ,0.0f},
{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f ,1.0f},
{1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f ,1.0f}
};
uint16_t target_shape[] = {4, 10};
aitensor_t target_tensor;
target_tensor.dtype = aif32;
target_tensor.dim = 2;
target_tensor.shape = target_shape;
target_tensor.data = target_data;
float output_data[4][10];
uint16_t output_shape[] = {4, 10};
aitensor_t output_tensor;
output_tensor.dtype = aif32;
output_tensor.dim = 2;
output_tensor.shape = output_shape;
output_tensor.data = output_data;
//
//
//
//
// And here a code snippet how the softmax should look like. It's from the PC tutorial
//
// Output dense layer
ailayer_dense_t dense_layer_2;
dense_layer_2.neurons = OUTPUTS;
ailayer_softmax_t output_layer_activation_softmax;
ailoss_crossentropy_t crossentropy_loss;
// -----Define the structure of the model ---------
aimodel_t model;
ailayer_t *x;
// Passing the layers to the AIfES model
model.input_layer = ailayer_input_f32_default(&input_layer);
x = ailayer_dense_f32_default(&dense_layer_1, model.input_layer);
x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
x = ailayer_dense_f32_default(&dense_layer_2, x);
x = ailayer_softmax_f32_default(&output_layer_activation_softmax, x);
model.output_layer = x;
// Add the loss to the AIfES model model.loss = ailoss_crossentropy_f32_default(&crossentropy_loss, model.output_layer);
aialgo_compile_model(&model); // Compile the AIfES model // // // Many greetings Pierre
Hi @AIfES-Pierre, thank you for your quick response. I found out from your suggestion that I had not defined the output tensor correctly. I had only defined the output tensor for the number of output neurons and not the number of training data.
Hi @konegen ,
please let me know if it worked.
It would also be great if you would share the results with us. ๐
Hi @AIfES-Pierre, I am trying to calculate a MNIST model (10000 training data) with AIfES on the PC. The training is now running through. However, the loss is over 10 million. What could be the reason that the loss is so high?
Hi @konegen,
there must be a mistake somewhere. Can you please post your AIfES network configuration?
Do you use softmax as activation function?
I just used the code of the XOR example and adjusted this one. The structure of the model looks like follows (if that's what you meant):
uint16_t input_layer_shape[] = {1, 784}; // MNIST images of shape 28x28 got flatted
ailayer_input_t input_layer;
input_layer.input_dim = 2;
input_layer.input_shape = input_layer_shape;
// Dense layer (hidden layer)
ailayer_dense_t dense_layer_1;
dense_layer_1.neurons = 10;
ailayer_sigmoid_t sigmoid_layer_1;
// Output dense layer
ailayer_dense_t dense_layer_2;
dense_layer_2.neurons = 10;
ailayer_softmax_t softmax_layer_2;
ailoss_crossentropy_t crossentropy_loss;
// --------------------------- Define the structure of the model ----------------------------
aimodel_t model;
ailayer_t *x;
// Passing the layers to the AIfES model
model.input_layer = ailayer_input_f32_default(&input_layer);
x = ailayer_dense_f32_default(&dense_layer_1, model.input_layer);
x = ailayer_sigmoid_f32_default(&sigmoid_layer_1, x);
x = ailayer_dense_f32_default(&dense_layer_2, x);
x = ailayer_softmax_f32_default(&softmax_layer_2, x);
model.output_layer = x;
// Add the loss to the AIfES model
model.loss = ailoss_crossentropy_f32_default(&crossentropy_loss, model.output_layer);
aialgo_compile_model(&model); // Compile the AIfES model
Thanks, Daniel
Hi @konegen ,
your code looks good, that should fit. And the loss is also correct. ๐
AIfES does not make any reduction / normalization for the loss. It is so high because you have so much training data.
In AIfES, the loss is not further processed in the mse and in the crossentropy. We will probably change this in the next update, because most people know the loss with reduction from e.g. Keras.
Here is a link to the Keras documentation where the reduction is explained. It is automatically activated, but can also be deactivated: https://www.tensorflow.org/api_docs/python/tf/keras/losses/Reduction
In AIfES you can easily add it. In the for loop before the loss is printed, you add the following line:
//DATASETS is your number of training sets and OUTPUTS is the number of output neurons
loss = loss / (OUTPUTS * DATASETS);
printf("%f\n",loss);
So how do you read into the MNIST database? I tested it today and found this: https://github.com/takafumihoriuchi/MNIST_for_C
Works great but you have to convert the image data to float because they are stored in double. And the labels still need to be adjusted. If I can do it, I will upload an example in the next days.
Pierre
Hi @AIfES-Pierre,
Thanks for the hint. I read in the MNIST dataset in a Python script using the tensorflow.keras.datasets.mnist library. Then I flatten the data into a 1D array and write it to an .h file. I then include this in the C program.
Thanks, Daniel
Hi @konegen,
another short explanation why we do not process the loss further. A division would have to be carried out in each case, which is very computationally intensive on microcontrollers. Each framework has its own technique.
There are often discussions about this topic. Here for example about the MSE: https://stats.stackexchange.com/questions/313070/mse-formula-in-neural-network-applications
Hi @AIfES-Pierre,
I have created three example scenarios for the MNIST dataset for AIfES on the PC:
I created a new branch in my forked repository, uploaded these three examples and made a pull request. If you want you can look at the examples and include them.
Many greetings Daniel
Hi @konegen ,
thanks for the great example. ๐
We have decided internally that this repository will remain exclusive to Arduino, because it will be installed via the Arduino library manager. Compatibility checks are done for this and code for PC can be problematic here.
But we will create an official AIfES example repository. ๐ Here examples for different platforms can be collected. We would be happy if you upload your example there.
Hi I want to train a NN for the MNIST dataset on the PC. How can I implement a training of a NN with multiple output neurons. When I am editing the example (link) and change the output layer to 10 neurons the inference of the model stops with an error (Process returned -1073741819 (0xC0000005) execution time : 0.485 s) Are there any changes I need to be especially aware of?