StevenShaw1999 / RSSA

52 stars 2 forks source link

How to train high-resolution models with limited GPU resources ? #4

Closed li-car-fei closed 1 year ago

li-car-fei commented 2 years ago

I only have 2 2080 GPUs, but I want to train a generator with a resolution of 1028*1028. How can I adjust the complex loss and parameters so that the model can be trained?

Since my generator is already pretrained, I guess I can fine-tune it by this method.

Thanks you

StevenShaw1999 commented 2 years ago

Hello, maybe computing loss on the lower resolution or randomly sampling patches from the feature maps of the higher resolution can cope with the GPU memory limitation. Yet I'm not quite sure whether it works on 2080 GPUs.

li-car-fei commented 2 years ago

Hello, maybe computing loss on the lower resolution or randomly sampling patches from the feature maps of the higher resolution can cope with the GPU memory limitation. Yet I'm not quite sure whether it works on 2080 GPUs.

I see that there is a GAN inversion operation before training, but the latent of the style image is not used during the training process. Why is this?

StevenShaw1999 commented 2 years ago

The inverted latent codes is used to project the random noises to the subspace during training. For example, you can refer to the code from line 248 to line 251 in the train_face_proj.py script. It is also used during the inference procedure.

li-car-fei commented 1 year ago

The inverted latent codes is used to project the random noises to the subspace during training. For example, you can refer to the code from line 248 to line 251 in the train_face_proj.py script. It is also used during the inference procedure.

If I want to fine-tune on a trained stylegan, should I load on generator or g_ema? Or should both be loaded?

StevenShaw1999 commented 1 year ago

My experiments are mainly based on the g_ema (including gan inversion and training), you can just refer to the codebase if you are confused with the experimental settings. The g_ema is the accumulation of the trained generator. If the latter is well trained, there won't be much difference whether you choose G or G_ema as your adapter.

li-car-fei commented 1 year ago

I have a stylegan that already has a strong style transfer capability, and its resolution is 256, loaded in ckpt_fine, and then the three generators are loaded as follows:

generator.load_state_dict(ckpt_fine["g"], strict=False)
g_source.load_state_dict(ckpt_source["g_ema"], strict=False)
g_ema.load_state_dict(ckpt_fine["g"], strict=False)

However, the generated images saved in the samples folder during model training are normal, while the images generated by generate.py after training are not normal.

I also tried to replace the generator or g_ema model with the parameters in source_ffhq, but still cannot generate normal pictures.

why is this ?

I just want to finetune my model with the method in this paper. I mean the trained model in the paper is replaced by my pretrained model

StevenShaw1999 commented 1 year ago

Maybe your model defination is slightly different from this repo so that the code did not load your pretrained parameters correctly. You can check this by switching the 'strict' option from False to True.

li-car-fei commented 1 year ago

when I switch the 'strict' option from False to True, it is no problems, and I try to save "g_ema_module": g_ema_module.state_dict() in train.py.

it can generate normal faces in generate.py, and I find it is the same in the samples folder during model training.

I'm so confused

StevenShaw1999 commented 1 year ago

Maybe the parameters of the model is packed in g_ema.module due to data parallelization in the training phase. I suggest you check this situation when using generate.py with your model. Maybe add '.module' behind your loaded checkpoints will help. I also suggest you check the 'strict' option in the generate.py scipts. I haven't met such problem when I conduct my experiments on single GPU.

li-car-fei commented 1 year ago

I think the above situation is caused by the use of multiple GPU training.

I found out how to modify the code to make it work, thanks ! ! !