Transfer Learning fails when training conditional model based on dataset labels

NVlabs / stylegan2-ada-pytorch

StyleGAN2-ADA - Official PyTorch implementation

https://arxiv.org/abs/2006.06676

Other

4.07k stars 1.16k forks source link

Transfer Learning fails when training conditional model based on dataset labels #98

Open ageroul opened 3 years ago

ageroul commented 3 years ago

Hi, I have prepared my dataset according to dataset_tool.py. Dimensions are 256x256 and has 5 classes(labels). The dataset.json file is also fine. Here is the problem: When running python train.py and my Transfer Learning source network is ffhq256 the execution fails pretty soon (in the beginning of "Constructing networks") with this error: RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1 When I run the same code but with the option cond='False' (ignore the dataset labels) the problem disappears and the transfer learning continues without error. What is the problem here? Thanks in advance! PS: I also tried ffhq512 (with option cond="True") but then I get error again: RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0

wdf19961118 commented 3 years ago

You can set --gpus=1 and try again, I see that training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs)) in training_loop.py, maybe it is the reason for your error. Good luck!

ageroul commented 3 years ago

You can set --gpus=1 and try again, I see that training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs)) in training_loop.py, maybe it is the reason for your error. Good luck!

Thanks for the answer, Unfortunately this is not the issue as I already set the option --gpus=1 in train.py.

chengkeng commented 3 years ago

I also encountered the same situation.

chengkeng commented 3 years ago

This must be re-trained, remove "--resume=xxx"

ageroul commented 3 years ago

This must be re-trained, remove "--resume=xxx"

If it "must" be retrained then there is no transfer learning happening...

wdf19961118 commented 3 years ago

You want to train a conditional model initialized by unconditional model(ffhq256), right? However, the structure of conditional model is different from unconditional model. You can print the structure and see that.

Gass2109 commented 3 years ago

Because the conditional model takes as input the concatenation (in the first dimension) of the label features (bs, 256) and the latent code (bs, 256), which gives a tensor of shape (bs, 512). However, the unconditional model takes only the latent representation (bs, 256). hope that helps :)

thusinh1969 commented 3 years ago

I closed my question because this is the reason !

Steve

wenhaoyong commented 3 years ago

I encountered a similar problem and I fixed it with the option' cond="True" '. Thx.

49xxy commented 2 years ago

我遇到了类似的情况，我用选项' cond="True" '修复了它。谢谢。

How did you solve it? I sincerely hope to get your help