Closed vicentowang closed 3 years ago
If you are using a generator with a resolution of 512 then the latent code should have 16 entries so seems fine. In our paper, we used an output resolution of 1024 which uses 18 entries
@yuval-alaluf y_hat batch size is 4 ? means one image input, generate 4 images?
The batch size of 4 means the number of input images. The number of output images is equal to the number of inputs - you get one output for each input
@yuval-alaluf x tensor batch size is 1 , after forward pass ,got y_hat batch size of 4 ? am I any config wrong?
Facing the same issue.
@MKFMIKU @yuval-alaluf I add a conv layer to solved problem.
class GradualStyleBlock(Module): def init(self, in_c, out_c, spatial): super(GradualStyleBlock, self).init() self.out_c = out_c self.spatial = spatial num_pools = int(np.log2(spatial)) modules = [] modules += [Conv2d(in_c, out_c, kernel_size=3, stride=2, padding=1), nn.LeakyReLU()] for i in range(num_pools - 1): modules += [ Conv2d(out_c, out_c, kernel_size=3, stride=2, padding=1), nn.LeakyReLU() ] modules += [Conv2d(out_c, out_c, kernel_size=4, stride=1, padding=1),nn.LeakyReLU()]
self.convs = nn.Sequential(*modules)
self.linear = EqualLinear(out_c, out_c, lr_mul=1)
def forward(self, x):
x = self.convs(x)
x = x.view(-1, self.out_c)
x = self.linear(x)
return x
You shouldn't need to change any of the architecture, but glad to see this solved your issue.
@yuval-alaluf the problem is caused by the convolution padding with different size of feature map. since you use the view() tensor operation.
@yuval-alaluf can you fix the shape problem mentioned above, not just me face the issue. and it is exist. I change the code,but cannot get the right result of image generation.
the problem is caused by the convolution padding with different size of feature map. since you use the view() tensor operation.
Not sure i follow. What do you mean by "different size of feature map"? If you have not made any changes to the inner workings of the code, there should be no issue. Can you please provide more details to your data resolution, stylegan output size, etc.
@yuval-alaluf stylegan output size is 512x512 , input x size is [1, 3, 512,512], ground truth y size is [1, 3, 512,512] output y_hat shape should be [1, 3, 512,512], but I got [4, 3, 512,512] , problem is that loss(y, y_hat) cannot be caculated,since shape is different. I debug into the problem find that latent code shape batch size is 4 , so got the y_hat batch size of 4,latent shape is the problem.
Ok. This seems to be caused because you are using a batch size of 1. Does this occur if you run with a batch size of 2?
@yuval-alaluf I haven't try that yet. since my GPU memory is not enough, so I use batch size of 1
I believe this is what is causing the issue. Working with a batch size of 1 results in unwanted changes to the tensor dimensions. Some changes will probably be needed to support a batch size of 1.
@yuval-alaluf Thanks for your concern.
@yuval-alaluf not really. batch size set to 2, got y_hat batch size of 8.
and what's this for ? the output size is forced to size of 256 at default sets
@vicentowang , I have just run the code and there is no problem with the code. believe you have either changed something in the code or changed the transforms.
Take a look at the original transforms: https://github.com/eladrich/pixel2style2pixel/blob/7a511c687bf2a8a64ba6a47150b37c7108329a6a/configs/transforms_config.py#L21-L37
You can see here that the images are resized to 256. Your images are of size 512
which tells me you have changed something. Please check that your code, transforms, and data are all correct.
As explained in the paper, regarding the face pool, since our inputs are of size 256, we resized the outputs to 256 so that we can compute the loss. This allows us to use lower resolution inputs during training to speed up training. At inference, however, you can still get the full 1024x1024 outputs.
I try batch size of 2,no working. here is the problem, codes = self.encoder(x) generate 4 times of latent code which batch size should be the same as x , no network changed, only the resolution --output_size 512, resize=False : y_hat, latent = self.net.forward(x, return_latents=True, resize=False)
Did you change the transforms? Based on the size of x (which is 512x512), you did change something.
no scale change since my image input is all 512x512
So you did change the code. Put back the rescaling to 256 and see if it fixes your problem. If you change the code and something doesn't work, it most likely means the change you made was the source of the problem.
I feel confused that changes to the EncodeTransforms affect the network structure output. but I cannot add transforms.Resize((256, 256)), because my input image is not png or jpg etc. have no api-resize for EXR image which is numpy array format.
If you change the input size, it will change the output size since we've working with convolutions. If you can't find a way to resize the input image, you will probably need to add another downsampling layer in the network.
here , x is the network input ,which i ensure it's shape is 512x512, no matter how I change the Preprocess of the image before forward pass as long as Preprocess have no change to the network structure.
I set image size to 256, and now its working, codes = self.encoder(x), codes and x ,y_hat batch size is the same as expected.
if I want to train image of 512 resolution, how should I change the codes, appreciate.
I dont see a reason to train on an input resolution of 512. We showed great results even when using inputs of 256. However, if you insist on training with 512, you need to add another downsampling layer in the GradualStyleBlock
i got it , and another problem , I trained a stylegan2-ada model(.pkl model file) which input is W space, and how to made it to receive input of W+ space.
pSp already handles working with W+. There is no change needed
All the details are in the readme. It is incredibly detailed.
.pkl loaded error in your code stylegan structure. my pretrained stylegan2-ada model may be kind of different to yours(not sure), so I have to build stylegan-ada structure in W+ space.
latent code shape is kind of different with the paper. why y_hat is batch size of 4 ?