I tried to do a training without the normalization layers.I noticed that the training is stalled and the value during the training always returns the same letter.The normalization layer is therefore necessary? Or simply the network takes more time for the training?
I tried to do a training without the normalization layers.I noticed that the training is stalled and the value during the training always returns the same letter.The normalization layer is therefore necessary? Or simply the network takes more time for the training?