ServiceNow / HighRes-net

Pytorch implementation of HighRes-net, a neural network for multi-frame super-resolution, trained and tested on the European Space Agency’s Kelvin competition. This is a ServiceNow Research project that was started at Element AI.
https://www.elementai.com/news/2019/computer-enhance-please
Other
279 stars 52 forks source link

Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 4, 202, 202] #12

Open smermet opened 2 years ago

smermet commented 2 years ago

I want to use your code with the Proba-V dataset, but I'm facing the following error.

$ python src/train.py --config config/config.json 0%| | 0/261 [00:00<?, ?it/s] 0%| | 0/400 [00:00<?, ?it/s] Traceback (most recent call last): File "[...]/HighRes-net/src/train.py", line 308, in <module> main(config) File "[...]/HighRes-net/src/train.py", line 294, in main trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config) File "[...]/HighRes-net/src/train.py", line 180, in trainAndGetBestModel srs_shifted = apply_shifts(regis_model, srs, shifts, device)[:, 0] File "[...]/HighRes-net/src/train.py", line 61, in apply_shifts new_images = shiftNet.transform(thetas, images, device=device) File "[...]/HighRes-net/src/DeepNetworks/ShiftNet.py", line 96, in transform new_I = lanczos.lanczos_shift(img=I.transpose(0, 1), File "[...]/HighRes-net/src/lanczos.py", line 96, in lanczos_shift I_s = torch.conv1d(I_padded, RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 4, 202, 202]

Here are the different values or shapes which are passed in the conv1d function : I_padded input shape : torch.Size([1, 4, 202, 202]) k_y.shape[0] and k_x.shape[0] groups number : 4 k_y and k_x weights shapes : torch.Size([4, 1, 7, 1]) (and torch.Size([4, 1, 1, 7])) [k_y.shape[2] // 2, 0] and [0, k_x.shape[3] // 2] padding values : [3, 0] and [3, 0]

I used the default config.json, except for the following parameters.

I tried to squeeze the 1st dim of img, the 2nd of weights and to specify a simple int value for padding to avoid the different error messages, but all I finally had is this new RuntimeError. 'Given groups=4, weight of size [4, 7, 1], expected input[4, 202, 202] to have 28 channels, but got 202 channels instead'

Any clue to help me?

smermet commented 2 years ago

This code was picked up in different projects available on GitHub (eg in RobustMFSRforEO) and I still faced the same error. I modified the code so that the input data corresponds to what is expected for the conv1d function, hoping not to have introduced an error... Changes are commented with ##.

def lanczos_shift(img, shift, p=3, a=3):
    '''
    Shifts an image by convolving it with a Lanczos kernel.
    Lanczos interpolation is an approximation to ideal sinc interpolation,
    by windowing a sinc kernel with another sinc function extending up to a
    few number of its lobes (typically a=3).
    Args:
        img : tensor (batch_size, channels, height, width), the images to be shifted
        shift : tensor (batch_size, 2) of translation parameters (dy, dx)
        p : int, padding width prior to convolution (default=3)
        a : int, number of lobes in the Lanczos interpolation kernel (default=3)
    Returns:
        I_s: tensor (batch_size, channels, height, width), shifted images
    '''

    ## These lines are added from RobustMFSRforEO
    B, C, H, W = img.shape
    # Because examples and channels are interleaved in dim 1.
    shift = shift.repeat(C, 1).reshape(B * C, 2)
    img = img.view(1, B * C, H, W)

    dtype = img.dtype

    if len(img.shape) == 2:
        img = img[None, None].repeat(1, shift.shape[0], 1, 1)  # batch of one image
    elif len(img.shape) == 3:  # one image per shift
        assert img.shape[0] == shift.shape[0]
        img = img[None, ]

    # Apply padding

    padder = torch.nn.ReflectionPad2d(p)  # reflect pre-padding
    I_padded = padder(img)
    I_padded_reshape = I_padded.view(I_padded.shape[0],I_padded.shape[1],-1) ## image reshaped to work with conv1d

    # Create 1D shifting kernels

    y_shift = shift[:, [0]]
    x_shift = shift[:, [1]]

    k_y = (lanczos_kernel(y_shift, a=a, N=None, dtype=dtype)
           .flip(1)  # flip axis of convolution
           )[:, None, :, None].squeeze(3)  ## expand dims to get shape (batch, channels, y_kernel) instead of (batch, channels, y_kernel, 1)
    k_x = (lanczos_kernel(x_shift, a=a, N=None, dtype=dtype)
           .flip(1)
           )[:, None, None, :].squeeze(2)  ## shape (batch, channels, x_kernel) instead of (batch, channels, 1, x_kernel)

    # Apply kernels

    I_s = torch.conv1d(I_padded_reshape, 
                       groups=k_y.shape[0],
                       weight=k_y,
                       padding=k_y.shape[2] // 2) ## previously : [k_y.shape[2] // 2, 0]
    I_s = torch.conv1d(I_s,
                       groups=k_x.shape[0],
                       weight=k_x,
                       padding=k_x.shape[2] // 2) ## previously : [0, k_x.shape[3] // 2]

    I_s = I_s.view(B, C, H+2*p, W+2*p) ## result reshaped in image format
    I_s = I_s[..., p:-p, p:-p]  # remove padding

    return I_s

Plus : in DeepNetworks/ShiftNet.py I removed the .transpose(0, 1) whose format does not correspond to the documentation

gbrzeczek commented 2 years ago

Hi, have you tested your solution since then? I encountered the same error, I'm currently running the training process with your changes, but there's nothing to compare the results with, since there's no pretrained model, sadly

smermet commented 2 years ago

Indeed, this repository does not contain pre-trained models. After a quick review, I see that this competing architecture also uses ShiftNet, with the same code for lanzcos_shift, this time with pre-trained models that can be used for comparison: https://github.com/rarefin/MISR-GRU

I had tested my proposal, but the results were worse using the ShiftNet than without... And below the values reported in the publication... I had given up hoping for some interaction on this forum to help me move forward!

On rereading my proposal, it seems obvious that the result of the shift cannot be applied twice, for x and y, to the same flattened matrix. I therefore complete my previous proposal with a slight modification between the two 1d convolutions. The image is reconstructed after the x-correction to be flattened again, this time to accommodate the y-correction.

... Unfortunately the results are not improved, some confusion must be introduced somewhere!

def lanczos_shift(img, shift, p=3, a=3):
    '''
    Shifts an image by convolving it with a Lanczos kernel.
    Lanczos interpolation is an approximation to ideal sinc interpolation,
    by windowing a sinc kernel with another sinc function extending up to a
    few number of its lobes (typically a=3).
    Args:
        img : tensor (batch_size, channels, height, width), the images to be shifted
        shift : tensor (batch_size, 2) of translation parameters (dy, dx)
        p : int, padding width prior to convolution (default=3)
        a : int, number of lobes in the Lanczos interpolation kernel (default=3)
    Returns:
        I_s: tensor (batch_size, channels, height, width), shifted images
    '''

    B, C, H, W = img.shape
    ## Because examples and channels are interleaved in dim 1.
    shift = shift.repeat(C, 1).reshape(B * C, 2)
    img = img.view(1, B * C, H, W)

    dtype = img.dtype

    if len(img.shape) == 2:
        img = img[None, None].repeat(1, shift.shape[0], 1, 1)  # batch of one image
    elif len(img.shape) == 3:  # one image per shift
        assert img.shape[0] == shift.shape[0]
        img = img[None, ]

    # Apply padding

    padder = torch.nn.ReflectionPad2d(p)  # reflect pre-padding
    I_padded = padder(img)
    I_padded_reshapeX = I_padded.view(I_padded.shape[0],I_padded.shape[1],-1)   ## The images are flattened

    # Create 1D shifting kernels

    y_shift = shift[:, [0]]
    x_shift = shift[:, [1]]

    k_y = (lanczos_kernel(y_shift, a=a, N=None, dtype=dtype)
           .flip(1)  # flip axis of convolution
           )[:, None, :, None].squeeze(3)  # expand dims to get shape (batch, channels, y_kernel, 1)
    k_x = (lanczos_kernel(x_shift, a=a, N=None, dtype=dtype)
           .flip(1)
           )[:, None, None, :].squeeze(2)  # shape (batch, channels, 1, x_kernel)

    # Apply kernels

    I_s_reshapeX = torch.conv1d(I_padded_reshapeX, #.permute(1, 0, 2, 3)
                       groups=k_y.shape[0],
                       weight=k_y,
                       padding=k_y.shape[2] // 2)  # same padding  # [k_y.shape[2] // 2, 0]

    I_s = I_s_reshapeX.view(1, B*C, H + 2 * p, W + 2 * p)  ## Reconstruction of the padded image
    I_s_reshapeY = I_s_reshapeY = I_s.transpose(2,3).reshape(I_s.shape[0],I_s.shape[1],-1)  ## The images are flattened after inversion of the width and heigth

    I_s_reshapeY = torch.conv1d(I_s_reshapeY,
                       groups=k_x.shape[0],
                       weight=k_x,
                       padding=k_x.shape[2] // 2) #[0, k_x.shape[3] // 2]

    I_s = I_s_reshapeY.reshape(B, C, W+2*p, H+2*p).transpose(2,3) ## Reconstruction of the image and re-inversion of width and height
    I_s = I_s[..., p:-p, p:-p]  # remove padding

    return I_s  # , k.squeeze()
Laymanpython commented 10 months ago

I think perhaps I have solved it.In my option,he wants to use convolution in single dimension.So he use a list[0, k_x.shape[3] // 2] in an image.the answer is to use 2d conv not 1d conv.And you should solve an inplace operation in network,which caused BP of network failed.

yunseok624 commented 9 months ago

I think perhaps I have solved it.In my option,he wants to use convolution in single dimension.So he use a list[0, k_x.shape[3] // 2] in an image.the answer is to use 2d conv not 1d conv.And you should solve an inplace operation in network,which caused BP of network failed.

Can you show us how you solved it? Cause I did change it to 2d conv, but I got this issue: 0%| | 0/400 [00:00<?, ?it/s]C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\imageio\plugins\pillow.py:320: UserWarning: Loading 16-bit (uint16) PNG as int32 due to limitations in pillow's PNG decoder. This will be fixed in a future version of pillow which will make this warning dissapear. warnings.warn( torch.Size([1, 8, 202, 202]) C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd__init.py:251: UserWarning: Error detected in MeanBackward1. Traceback of forward call that caused the error: File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 309, in main(config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 295, in main trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 177, in trainAndGetBestModel shifts = register_batch(regis_model, File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 40, in register_batch theta = shiftNet(torch.cat([reference, lrs[:, i : i + 1]], 1)) File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\DeepNetworks\ShiftNet.py", line 66, in forward x[:, 1] = x[:, 1] - torch.mean(x[:, 1], dim=(1, 2)).view(-1, 1, 1) (Triggered internally at ..\torch\csrc\autograd\python_anomaly_mode.cpp:119.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 0%| | 0/131 [00:10<?, ?it/s] 0%| | 0/400 [00:10<?, ?it/s] Traceback (most recent call last): File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 309, in main(config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 295, in main trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 190, in trainAndGetBestModel loss.backward() File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.py", line 492, in backward torch.autograd.backward( File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 128, 128]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Laymanpython commented 9 months ago

I think perhaps I have solved it.In my option,he wants to use convolution in single dimension.So he use a list[0, k_x.shape[3] // 2] in an image.the answer is to use 2d conv not 1d conv.And you should solve an inplace operation in network,which caused BP of network failed.

Can you show us how you solved it? Cause I did change it to 2d conv, but I got this issue: 0%| | 0/400 [00:00<?, ?it/s]C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\imageio\plugins\pillow.py:320: UserWarning: Loading 16-bit (uint16) PNG as int32 due to limitations in pillow's PNG decoder. This will be fixed in a future version of pillow which will make this warning dissapear. warnings.warn( torch.Size([1, 8, 202, 202]) C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autogradinit.py:251: UserWarning: Error detected in MeanBackward1. Traceback of forward call that caused the error: File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 309, in main(config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 295, in main trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 177, in trainAndGetBestModel shifts = register_batch(regis_model, File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 40, in register_batch theta = shiftNet(torch.cat([reference, lrs[:, i : i + 1]], 1)) File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\DeepNetworks\ShiftNet.py", line 66, in forward x[:, 1] = x[:, 1] - torch.mean(x[:, 1], dim=(1, 2)).view(-1, 1, 1) (Triggered internally at ..\torch\csrc\autograd\python_anomaly_mode.cpp:119.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 0%| | 0/131 [00:10<?, ?it/s] 0%| | 0/400 [00:10<?, ?it/s] Traceback (most recent call last): File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 309, in main(config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 295, in main trainAndGetBestModel(fusion_model, regis_model, optimizer, dataloaders, baseline_cpsnrs, config) File "C:\Users\Юнсок\Desktop\Research\MISR\HighRes-net-master\src\train.py", line 190, in trainAndGetBestModel loss.backward() File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_tensor.py", line 492, in backward torch.autograd.backward( File "C:\Users\Юнсок\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autogradinit.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 128, 128]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

This is because of inplace operation in pytorch.I remembered it appeared in shiftnet.py, you can try to fix it.But I get a bad result,so I gave up.