elliottwu / unsup3d

(CVPR'20 Oral) Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
MIT License
1.19k stars 192 forks source link

Questions about image size #26

Closed YokkaBear closed 3 years ago

YokkaBear commented 3 years ago

Hi, @elliottwu , Sorry to bother you again, but I have 2 questions about the setting of image size:

  1. Will increasing the input image_size improve the reconstruction effect? Since I have another dataset trained on the unsup3d model, but I didn't get satisfactory recon results, so I wonder if increasing the input image_size will heal the problem;

  2. I have tried increasing the image_size of the input image, set image_size in data_loader as 128 (2 times as original image_size=64), but I encountered the following error:

    RuntimeError: The size of tensor a (128) must match the size of tensor b (4224) at non-singleton dimension 0

    After checking, I found the two tensors are canon_normal and canon_light_d.view(-1,1,1,3) in the forward process, a element-wise multiplication will be operated on them, but they are unequal on the first dimension respectively:

    torch.Size([128, 128, 128, 3])
    torch.Size([4224, 1, 1, 3])

    So I wonder if you have encountered this kind of error, and how you solved it. Thank you very much, and looking forward to your response.

ALLinLLM commented 3 years ago

I meet this problem too when I try to train the model with 128x128 input

And I address the problem by adding a conv layer in the Encoder in unsup3d/network.py:

class Encoder(nn.Module):
    def __init__(self, cin, cout, nf=64, activation=nn.Tanh, input_size=64):
        super(Encoder, self).__init__()
        network = [
            nn.Conv2d(cin, nf, kernel_size=4, stride=2, padding=1, bias=False),  # 64x64 -> 32x32
            nn.ReLU(inplace=True),
            nn.Conv2d(nf, nf*2, kernel_size=4, stride=2, padding=1, bias=False),  # 32x32 -> 16x16
            nn.ReLU(inplace=True),
            nn.Conv2d(nf*2, nf*4, kernel_size=4, stride=2, padding=1, bias=False),  # 16x16 -> 8x8
            nn.ReLU(inplace=True),
            nn.Conv2d(nf*4, nf*8, kernel_size=4, stride=2, padding=1, bias=False),  # 8x8 -> 4x4
            nn.ReLU(inplace=True),
            ]  # 4x4 -> 1x1
        if input_size==128:
            network.extend([
                nn.Conv2d(nf*8, nf*8, kernel_size=4, stride=2, padding=1, bias=False),  # 8x8 -> 4x4
                nn.ReLU(inplace=True)])
        network.extend([
            nn.Conv2d(nf*8, nf*8, kernel_size=4, stride=1, padding=0, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(nf*8, cout, kernel_size=1, stride=1, padding=0, bias=False)
        ])
        if activation is not None:
            network += [activation()]
        self.network = nn.Sequential(*network)

    def forward(self, input):
        return self.network(input).reshape(input.size(0),-1) # 64|-> 64,4,1,1 128|-> 64 4, 5, 5

BTW, train with 128x128 need a lot GPU memory, and I set the batchsize to 8 on my RTX 2080ti

And the result is not good as the 64x64 input, you can refer to the #9 issue, the author explain the reason

YokkaBear commented 3 years ago

@vegetable09 Thank you for your reply, I will refer to this solution if needed.