Open ageroul opened 3 years ago
You can set --gpus=1 and try again, I see that training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs))
in training_loop.py, maybe it is the reason for your error. Good luck!
You can set --gpus=1 and try again, I see that
training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs))
in training_loop.py, maybe it is the reason for your error. Good luck!
Thanks for the answer, Unfortunately this is not the issue as I already set the option --gpus=1 in train.py.
I also encountered the same situation.
This must be re-trained, remove "--resume=xxx"
This must be re-trained, remove "--resume=xxx"
If it "must" be retrained then there is no transfer learning happening...
You want to train a conditional model initialized by unconditional model(ffhq256), right? However, the structure of conditional model is different from unconditional model. You can print the structure and see that.
Because the conditional model takes as input the concatenation (in the first dimension) of the label features (bs, 256) and the latent code (bs, 256), which gives a tensor of shape (bs, 512). However, the unconditional model takes only the latent representation (bs, 256). hope that helps :)
I closed my question because this is the reason !
Steve
I encountered a similar problem and I fixed it with the option' cond="True" '. Thx.
我遇到了类似的情况,我用选项' cond="True" '修复了它。谢谢。
How did you solve it? I sincerely hope to get your help
Hi, I have prepared my dataset according to dataset_tool.py. Dimensions are 256x256 and has 5 classes(labels). The dataset.json file is also fine. Here is the problem: When running python train.py and my Transfer Learning source network is ffhq256 the execution fails pretty soon (in the beginning of "Constructing networks") with this error:
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1
When I run the same code but with the optioncond='False'
(ignore the dataset labels) the problem disappears and the transfer learning continues without error. What is the problem here? Thanks in advance! PS: I also tried ffhq512 (with option cond="True") but then I get error again:RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0