Open GreNait opened 4 years ago
TF workflow guide for image classification with cats and dogs:
https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough -> looks like a better walkthrough, because I do not have my data yet organzied.
-> first part was similoar to the first walktrhough and not helpful. Because i couldn't use tf.keras.utils.get_file <- download data, which I am not doing
Using the youtube videos from sentdex looks promising
Just found out, that using the "#&&" in vscode, defines part of the program as "cells" like in jupyter notebooks. This is great, for coding and testing, bigger data amounts do not need to be reloaded with every start of the program.
Following along the tutorial from sentdex adding now convolutions etc.
https://www.youtube.com/watch?v=WvoLTXIjBYU&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=4&t=0s
First model trained, accuracy is really bad with 34% BUT it worked !!
The nvidia-smi command somehow worked today and the model was a lot faster too calculate
Still, the accuracy of the model with 100 epochs is miserable
Using the tensorboard for showing the training of the model
https://www.youtube.com/watch?v=BqgTU7_cBnk&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN&index=5&t=24s
I just had an "blackout" of the system during the training of the model. The fan of the GPU started to spin brutaly and the display went off. I had to hard restart the pc again ... Some googling didn't help yet.
After updating the gpu driver via the ubunut system driver setting and rebooting the system, the new model was trained witch a lot higher neurons and 100 epochs. Still, the accuracy is terrible with 35%, but at least it looks like it is working.
The black out of the display happend again. I assume, that this might be caused by the memory allocation of the gpu by tensorflow. I searched for a way to limit the possible allocation. There was an blogpost here: https://github.com/tensorflow/tensorflow/issues/25138
Recommending following two lines:
tf.config.gpu.set_per_process_memory_fraction(0.95) tf.config.gpu.set_per_process_memory_growth(True)
I decided, to start with 95% allocation.
Didn't work. Had another blakc out ...