boschresearch / ISSA

Official implementation of "Intra-Source Style Augmentation for Improved Domain Generalization" (WACV 2023 & IJCV)
GNU Affero General Public License v3.0
34 stars 4 forks source link

Some questions #8

Open taluos opened 3 months ago

taluos commented 3 months ago

Dear Author,

Thank you very much for your work. I have a few questions I would like to ask: Have you ever used human face datasets (such as FFHQ) for training? If so, how were the results? If not, can I directly use ISSA for related training?

YumengLi007 commented 3 months ago

Hi @taluos , thanks for your interests. At the beginning of the project, we have tried on face datasets, which ISSA also results in better reconstruction quality. But the hyperparameters, e.g, mask size and random masking ratio, were set differently as far as I remember.

taluos commented 3 months ago

Hello! Thank you very much for your response. Addiction, I noticed that in your paper, you mentioned training the StyleGAN to synthesize images before training the encoder. I’m wondering if I can skip this step if I use a pre-trained StyleGAN model? Also, I’m not very clear about the function of data_fake in the configuration file. It seems to contain images generated by StyleGAN, but I’m unsure of its role in the overall training process.

YumengLi007 commented 3 months ago

Hi @taluos ,

  1. Yes, you could directly use a pretrained StyleGAN models. Just there wasn't one trained on Cityscapes, so I trained one myself.
  2. Right, it contains images generated by StyleGAN together with the style latent w. They are used as regularization so tthat the inverted codes can stay close to the original latent space. Also we observed this can speed up training convergence. Please see Eq.(8) in the paper. image
taluos commented 3 months ago

Thanks again @YumengLi007 , I see. So, before training the encoder, I generate some images, and during the training process, using the encoder to obtain their latent codes. In that case, how many images do I need to generate for training?

I tried training, but I encountered the following error: Traceback (most recent call last): File "train_encoder.py", line 493, in <module> main(rank=0) File "train_encoder.py", line 211, in main resume_data = misc.load_network_pkl(f) AttributeError: module 'torch_utils.misc' has no attribute 'load_network_pkl' I obtained torch_utils from https://github.com/NVlabs/stylegan3, but it doesn't include the load_network_pkl method. I only found that this method exists in a similar file from https://github.com/NVlabs/stylegan. Could you provide your torch_utils file, or could you advise me on how to modify it?

YumengLi007 commented 3 months ago

Hi @taluos

  1. 50k images should be enough.
  2. You might found this issue helpful :) https://github.com/boschresearch/ISSA/issues/7
taluos commented 3 months ago

I encountered the following error during the training process:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/opt/data/private/ISSA/ISSA-main/ISSA-main/train_encoder.py", line 279, in main
    E=encoder,D=D_enc, G=generator, percept=percept,
UnboundLocalError: local variable 'D_enc' referenced before assignment

It seems to be caused by the fact that I am training with two GPUs, which results in the GPU with rank = 1 not having this variable. How can I resolve this issue?"

taluos commented 3 months ago

I simply modified the following code so that this part is executed on both rank = 0 and 1, which solved the issue, but I'm not sure if this is the correct approach.

if rank == 0:
        print('Setting Discriminator...')
D_channel = training_set.num_channels
common_kwargs = dict(input_nc=D_channel, getIntermFeat=True)
D_enc = create_class_by_name(**config.enc_D_kwargs, **common_kwargs).train().requires_grad_(
        False).to(device) # subclass of torch.nn.Module