JarrentWu1031 / CCPL

[ECCV 2022 Oral] Official Pytorch implementation of CCPL and SCTNet
Apache License 2.0
191 stars 26 forks source link

pre-trained models #3

Closed chenxwh closed 2 years ago

chenxwh commented 2 years ago

Hi, are the pre-trained models released here ready to be used for any customised content/style image pairs? Thanks!

JarrentWu1031 commented 2 years ago

Yes, all the released models are designed for arbitrary style transfer.

realzheng commented 2 years ago

Hi @JarrentWu1031, thanks for the greate job and share all the pretrained models and codes. Does the only required model for training is vgg_normalized.pth? Could you share some informations on how to train this encoder?

JarrentWu1031 commented 2 years ago

Yes, the only pre-trained model involved here is the vgg model. It is trained on ImageNet for classification. If you are interested, you could train your own encoder from scratch. You could refer to the original paper of VGG.

realzheng commented 2 years ago

Hi @JarrentWu1031 , I have tried to retrain the encoder, but the result looks not good. Do you have some recommanded encoder codes for this? thanks.

JarrentWu1031 commented 2 years ago

In this paper, we didn't train our own encoder. Instead we use the pre-trained model from 'https://drive.google.com/file/d/1EpkBA2K2eYILDSyPTt0fztz59UjAIpZU/view?usp=sharing' as previous works do. If you really want to retrain your own encoder, you could google like 'VGG ImageNet classification'. I think it's ok for you to use the pre-trained model as well. Our work aims to train the SCT module and decoder, not including the encoder (we assume the extracted features are meaningful).

realzheng commented 1 year ago

Hi @JarrentWu1031 I retrained the vgg net with pytorch on imagenet classification task, and the model converged. But it does not work for style transfer. I dumped the vgg_normalised.pt's weights, and its first conv weights is:<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

0.weight | 0.bias -- | -- 0 | -103.939 0 | -116.779 255 | -123.68 0 |   255 |   0 |   255 |   0 |   0 |  

Do you know why?

JarrentWu1031 commented 1 year ago

Hi Zheng, I might misunderstand you in the very beginning. In fact, the entire stylization network in this work consists of three parts: an VGG encoder, a decoder and a SCT module (proposed in this paper). What need to be trained are the decoder and the SCT module to fulfill stylization, while the encoder (pre-trained on ImageNet) does not need to be trained.

realzheng commented 1 year ago

@JarrentWu1031

Yes, I understand the whole pipeline, and I am trying to train a light-weighted network to get real-time inference. But I found that the vgg based encoder took much time, that why I am trying to retrain the encoder.

JarrentWu1031 commented 1 year ago

Well, that is strange. It seems the provided VGG model shares the same weights with yours. Have you fix the parameters of your pre-trained encoder during style transfer training?

realzheng commented 1 year ago

Well, that is strange. It seems the provided VGG model shares the same weights with yours. Have you fix the parameters of your pre-trained encoder during style transfer training?

Yes, I have fixed the encoder's parameters. Seems the provided vgg's weights have be normalized by the imagenet dataset. I found that lots of arbitrary style transfer methodologies used this normalized vgg weights.