Justin-Tan / high-fidelity-generative-compression

Pytorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
Apache License 2.0
411 stars 77 forks source link

what is the minimum input size for this model #14

Closed ZhangYuef closed 3 years ago

ZhangYuef commented 3 years ago

I am trying to use a 64x64 pixel image as the input of the model. Then I get the following error:

/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

<decorator-gen-60> in time(self, line, cell, local_ns)

/usr/local/lib/python3.6/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):

/usr/local/lib/python3.6/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
   1191         else:
   1192             st = clock2()
-> 1193             exec(code, glob, local_ns)
   1194             end = clock2()
   1195             out = None

<timed exec> in <module>()

/content/high-fidelity-generative-compression/compress.py in compress_and_save(model, args, data_loader, output_dir)
     76 
     77             # Perform entropy coding
---> 78             compressed_output = model.compress(data)
     79 
     80             out_path = os.path.join(output_dir, f"{filenames[0]}_compressed.hfc")

/content/high-fidelity-generative-compression/src/model.py in compress(self, x, silent)
    290             y = utils.pad_factor(y, y.size()[2:], factor)
    291 
--> 292         compression_output = self.Hyperprior.compress_forward(y, spatial_shape)
    293         attained_hbpp = 32 * len(compression_output.hyperlatents_encoded) / np.prod(spatial_shape)
    294         attained_lbpp = 32 * len(compression_output.latents_encoded) / np.prod(spatial_shape)

/content/high-fidelity-generative-compression/src/hyperprior.py in compress_forward(self, latents, spatial_shape, **kwargs)
    196 
    197         # Obtain hyperlatents from hyperencoder
--> 198         hyperlatents = self.analysis_net(latents)
    199         hyperlatent_spatial_shape = hyperlatents.size()[2:]
    200         batch_shape = latents.size(0)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/high-fidelity-generative-compression/src/network/hyper.py in forward(self, x)
     59         x = self.activation(self.conv1(x))
     60         x = self.activation(self.conv2(x))
---> 61         x = self.conv3(x)
     62 
     63         return x

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    417 
    418     def forward(self, input: Tensor) -> Tensor:
--> 419         return self._conv_forward(input, self.weight)
    420 
    421 class Conv3d(_ConvNd):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    410     def _conv_forward(self, input, weight):
    411         if self.padding_mode != 'zeros':
--> 412             return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    413                             weight, self.bias, self.stride,
    414                             _pair(0), self.dilation, self.groups)

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
   3567             assert len(pad) == 4, '4D tensors expect 4 values for padding'
   3568             if mode == 'reflect':
-> 3569                 return torch._C._nn.reflection_pad2d(input, pad)
   3570             elif mode == 'replicate':
   3571                 return torch._C._nn.replication_pad2d(input, pad)

RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (2, 2) at dimension 3 of input [1, 320, 2, 2]

So I am wondering what is the minimum input size for this model. Thanks in advance : )

Justin-Tan commented 3 years ago

It's trained on 256 x 256 random crops of arbitrary-sized images. You can change this under the 'size' heading in default_config.py. The encoder downsizes the spatial dimensions of the input image by a factor of 16 in each dimension, so the minimum image size would be 16x16 in principle.