RuntimeError: size mismatch, m1: [2 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

dearleiii commented 6 years ago

      (0): Linear(in_features=1048576, out_features=256, bias=True)
        (1): LeakyReLU(negative_slope=0.01)
        (2): Linear(in_features=256, out_features=1, bias=True)
      )
    )
  )
)
intpus:  torch.Size([50, 3, 1020, 2040]) scores:  torch.Size([50, 1])
Traceback (most recent call last):
  File "load_model_test.py", line 78, in <module>
    outputs = model1(inputs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/project/xtmp/superresoluter/approximator/model1/apxm.py", line 60, in forward
    output = self.regressor(x)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [2 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249
leichen@gpu-compute2$ python3 load_model_test.py

dearleiii commented 6 years ago

Ideas to Try:

search about error reason for size mismatch

dearleiii commented 6 years ago

Analysis: when calling output = model(inputs) input was put into parallel When calling:

File "/usr/project/xtmp/superresoluter/approximator/model1/apxm.py", line 60, in forward
    output = self.regressor(x)

Mismatched size: 
size mismatch, m1: [2 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

To figure out is m1, / m2 matched with input size / model size ?

dearleiii commented 6 years ago

Model 1 structure:

leichen@gpu-compute2$ python3 load_model_test.py
DataParallel(
  (module): APXM_conv3(
    (main): Sequential(
      (0): Conv2d(3, 8, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (1): LeakyReLU(negative_slope=0.2, inplace)
      (2): Conv2d(8, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
      (3): LeakyReLU(negative_slope=0.2, inplace)
      (4): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (5): LeakyReLU(negative_slope=0.2, inplace)
    )
    (regressor): Sequential(
      (0): Linear(in_features=1048576, out_features=256, bias=True)
      (1): LeakyReLU(negative_slope=0.01)
      (2): Linear(in_features=256, out_features=1, bias=True)
    )
  )
)
DataParallel(
  (module): DataParallel(
    (module): APXM_conv3(
      (main): Sequential(
        (0): Conv2d(3, 8, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
        (1): LeakyReLU(negative_slope=0.2, inplace)
        (2): Conv2d(8, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
        (3): LeakyReLU(negative_slope=0.2, inplace)
        (4): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
        (5): LeakyReLU(negative_slope=0.2, inplace)
      )
      (regressor): Sequential(
        (0): Linear(in_features=1048576, out_features=256, bias=True)
        (1): LeakyReLU(negative_slope=0.01)
        (2): Linear(in_features=256, out_features=1, bias=True)
      )
    )
  )
)

32 128 256 1048576

dearleiii commented 6 years ago

in apex.py:

    def forward(self, x):
        x = x.float()
        x = self.main(x)
        print('inputs after model main: ', x.size())
        #Reshape data to input to the input layer of the neural net
        #Size changes from (36, 255, 510) to (1, flatten)
        #Recall that the -1 infers this dimension from the other given \
dimension
        x = x.view(x.size(0), -1)
        print(x.size(1), self.regressor.weight.size(1))
        #Computes the activation of the first fully connected layer
        #Size changes from (1, flatten) to (1, 64)
        output = self.regressor(x)
        return output

Running results:

intpus:  torch.Size([50, 3, 1020, 2040]) scores:  torch.Size([50, 1])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])

32 127255 1036320

dearleiii commented 6 years ago

intpus:  torch.Size([50, 3, 1020, 2040]) scores:  torch.Size([50, 1])
input shape:  torch.Size([2, 3, 1020, 2040])
input shape:  torch.Size([2, 3, 1020, 2040])
input shape:  torch.Size([2, 3, 1020, 2040])
input shape:  torch.Size([2, 3, 1020, 2040])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])
inputs after model main:  torch.Size([2, 32, 127, 255])

When using 8 GPUs:

input shape:  torch.Size([1, 3, 1020, 2040])
input shape:  torch.Size([1, 3, 1020, 2040])
input shape:  torch.Size([1, 3, 1020, 2040])
inputs after model main:  torch.Size([1, 32, 127, 255])
inputs after model main:  torch.Size([1, 32, 127, 255])
inputs after model main:  torch.Size([1, 32, 127, 255])
inputs after model main:  torch.Size([1, 32, 127, 255])
inputs after model main:  torch.Size([1, 32, 127, 255])
inputs after model main:  torch.Size([1, 32, 127, 255])

dearleiii commented 6 years ago

As far as I know, the size mismatch always comes from the network architecture like the number of convolution which not have the required size. My suggestion you can go to your network code. From your code I see output = sae(input) . You can see in class sae and check the size for every layer.

dearleiii commented 6 years ago

1st problem: the parallel outputs are not combined.

intpus:  torch.Size([50, 3, 1020, 2040]) scores:  torch.Size([50, 1])
input shape:  torch.Size([13, 3, 1020, 2040])
input shape:  torch.Size([12, 3, 1020, 2040])
inputs after model main:  torch.Size([13, 32, 127, 255])
1036320
inputs after model main:  torch.Size([12, 32, 127, 255])
1036320

  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [13 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

2nd problem: why 127, 255 appeared instead of 128, 256?

dearleiii / PIRM-2018-SISR-Challenge

RuntimeError: size mismatch, m1: [2 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249 #23