preprocess for vgg loss

Athenanna commented 1 year ago

hi, your work is so great, and i am trying some of the tricks in my program. while i am confused about the VGG-LOSS-preprocess, if there are some special reasons for centorcrop, rather than resize to 224x224? the input shape to G is 256x256, then what about the outer part (256-224)? preprocess.append(transforms.Resize(256)) preprocess.append(transforms.CenterCrop(224)) and did you do some 'with-out-classifier model' experiments? as it limits the ability for real complicated image. And I was wondering, with pretrained vgg for Perceptual loss, the G model may already have " classify sight"?

thanks

KIMGEONUNG commented 1 year ago

Thank you for your interest on our work.

First of all, there is no special reason to use the centercrop rather than resize, and I'm sure that the difference would not be meaningful to the final performance of the colorization model.

For the question to 'without classifier model', It is not possible to construct our BigColor model without class-label, because we used BigGAN as a backbone network, which require a class label as an input, i.e., BigGAN is class conditional model. Hence, there is no experiment conducted on 'without classifier model'.

In the last question, I guess the 'classification sight' do not come from pretrained VGG perceptual loss. I think that if the perceptual loss give a 'classification insight' to the generator, the trained model without perceptual loss should tend to produce worse semantically correct colors than original model. But, in our experiments, though not described in main paper, the absence of perceptual loss only resulted in degraded texture details. Rather, the 'classify sight' came from adversarial loss because our discriminator also inputs class label.

I hope you my answer is helpful to you and If any remaining question, free to ask me.

Athenanna commented 1 year ago

many thanks and sorry for the late reply, i did some experiments these days.

first of all, you are right, L1 loss (wo VGG loss) also works. while, why chose "id_targets = [1, 2, 13, 20]"? are there some Ablation study about it?

and i found some specific artifacts：something like checkerboard in green and red color: did you get any similar result before？what would be the reason？I got similar results(checkerboard but not colorfull) in other tasks before when using VGG loss before.

and, I am training a model without Classifier model, the result is not good yet. but i find that diffrent gan model (D loss) give very different results: the results of wgangp-gan-loss is low-saturation，like the results of DeOldify. the result of gan-vanilla-loss(BCE for D) is much more colorful but cant keep the spatial structure of the input. did you study about the choice of gan model (D loss) in colorization task before ?

KIMGEONUNG commented 1 year ago

Sorry for the late reply. As you said, there are many choices which layers of VGG are used. Although no ablation, the results would be similar if we properly combine the layers of high and low level, based on my experience.

I have on idea and no experience to the checkerboard artifact, though, the peak values of red or green or blue are generally seen in all colorization results. However, I don't know what the problem is, yet.

As for the discriminator design, in my opinion, the best approach is the trial and error, i.e., try many thing and choose the best one. Becuase, It is to hard to get a intutition , which loss is the best and why. However, If you have a plan to use pretrained discriminator and generator, I recommend to use the same discriminator loss as the thing used in training time. This is also in my main paper, BigColor Table 3 and Figure 8,

KIMGEONUNG / BigColor

preprocess for vgg loss #8