Open Victarry opened 3 years ago
same question
Great question. Just like when vgg is training on imagenet, the input image needs to be preprocessed, which includes subtracting the mean value of 3 channels. It is actually a standardized operation. It is assumed that the natural image defaults to a stable data distribution (that is, the statistics of each dimension of the data obey the same distribution). At this time, subtracting the statistical average of the data from each sample can be remove common parts to highlight individual differences. My original intention of using this operation is to remove the common tones in the animation dataset to reduce its interference with the color of the generated image. But in fact, this does not have a great effect. Since the discriminator judges the real color picture as true, the model will still learn the color style.
I can tell that after datamean, the RBG channel of anime image have the same mean value for the whole style dataset, but I'm confused about it's purpose. Could you please explain what the difference it make?