You don't have to normalise, but generally speaking it's basically impossible to say what will produce the best result without trying a few different options (on the training set, or with a validation set). For some datasets it won't make any difference, for some datasets it might be better to normalise each channel (for each example), for some datasets it might be better to do something else.
An example of where normalising each channel might make sense is if the channels are on very different scales/have very different magnitudes.
Hi @Presburger, good question.
You don't have to normalise, but generally speaking it's basically impossible to say what will produce the best result without trying a few different options (on the training set, or with a validation set). For some datasets it won't make any difference, for some datasets it might be better to normalise each channel (for each example), for some datasets it might be better to do something else.
An example of where normalising each channel might make sense is if the channels are on very different scales/have very different magnitudes.