NVlabs / pacnet

Pixel-Adaptive Convolutional Neural Networks (CVPR '19)
https://suhangpro.github.io/pac/
Other
512 stars 79 forks source link
computer-vision deep-learning machine-learning

Pixel-Adaptive Convolutional Neural Networks

Project page | Paper | Video

Pixel-Adaptive Convolutional Neural Networks
Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, and Jan Kautz.
CVPR 2019.

License

Copyright (C) 2019 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

Installation

Layer Catalog

We implemented 5 types of PAC layers (as PyTorch Module):

More details regarding each layer is provided below.

PacConv2d

PacConv2d is the PAC counterpart of nn.Conv2d. It accepts most standard nn.Conv2d arguments (including in_channels, out_channels, kernel_size, bias, stride, padding, dilation, but not groups and padding_mode), and we make sure that when the same arguments are used, PacConv2d and nn.Conv2d have the exact same output sizes. A few additional optional arguments are available:

    Args (in addition to those of Conv2d):
        kernel_type (str): 'gaussian' | 'inv_{alpha}_{lambda}[_asym][_fixed]'. Default: 'gaussian'
        smooth_kernel_type (str): 'none' | 'gaussian' | 'average_{sz}' | 'full_{sz}'. Default: 'none'
        normalize_kernel (bool): Default: False
        shared_filters (bool): Default: False
        filler (str): 'uniform'. Default: 'uniform'

    Note:
        - kernel_size only accepts odd numbers
        - padding should not be larger than :math:`dilation * (kernel_size - 1) / 2`

When used to build computation graphs, this layer takes two input tensors and generates one output tensor:

in_ch, out_ch, g_ch = 16, 32, 8         # channel sizes of input, output and guidance
f, b, h, w = 5, 2, 64, 64               # filter size, batch size, input height and width
input = torch.rand(b, in_ch, h, w)
guide = torch.rand(b, g_ch, h, w)       # guidance feature ('f' in Eq.3 of paper)

conv = nn.Conv2d(in_ch, out_ch, f)
out_conv = conv(input)                  # standard spatial convolution

pacconv = PacConv2d(in_ch, out_ch, f)   
out_pac = pacconv(input, guide)         # PAC 
out_pac = pacconv(input, None, guide_k) # alternative interface
                                        # guide_k is pre-computed 'K' (see Eq.3 of paper) 
                                        # of shape [b, g_ch, f, f, h, w]. packernel2d can be 
                                        # used for its creation.  

Use pacconv2d (in conjunction with packernel2d) for its functional interface.

PacConvTranspose2d

PacConvTranspose2d is the PAC counterpart of nn.ConvTranspose2d. It accepts most standard nn.ConvTranspose2d arguments (including in_channels, out_channels, kernel_size, bias, stride, padding, output_padding, dilation, but not groups and padding_mode), and we make sure that when the same arguments are used, PacConvTranspose2d and nn.ConvTranspose2d have the exact same output sizes. A few additional optional arguments are available: , and also a few additional ones:

    Args (in addition to those of ConvTranspose2d):
        kernel_type (str): 'gaussian' | 'inv_{alpha}_{lambda}[_asym][_fixed]'. Default: 'gaussian'
        smooth_kernel_type (str): 'none' | 'gaussian' | 'average_{sz}' | 'full_{sz}'. Default: 'none'
        normalize_kernel (bool): Default: False
        shared_filters (bool): Default: False
        filler (str): 'uniform' | 'linear'. Default: 'uniform'

    Note:
        - kernel_size only accepts odd numbers
        - padding should not be larger than :math:`dilation * (kernel_size - 1) / 2`

Similar to PacConv2d, PacConvTranspose2d also offers two ways of usage:

in_ch, out_ch, g_ch = 16, 32, 8             # channel sizes of input, output and guidance
f, b, h, w, oh, ow = 5, 2, 8, 8, 16, 16     # filter size, batch size, input height and width
input = torch.rand(b, in_ch, h, w)
guide = torch.rand(b, g_ch, oh, ow)         # guidance feature, note that it needs to match 
                                            # the spatial sizes of the output

convt = nn.ConvTranspose2d(in_ch, out_ch, f, stride=2, padding=2, output_padding=1)
out_convt = convt(input)                    # standard transposed convolution

pacconvt = PacConvTranspose2d(in_ch, out_ch, f, stride=2, padding=2, output_padding=1)   
out_pact = pacconvt(input, guide)           # PAC 
out_pact = pacconvt(input, None, guide_k)   # alternative interface
                                            # guide_k is pre-computed 'K' 
                                            # of shape [b, g_ch, f, f, oh, ow].
                                            # packernel2d can be used for its creation.  

Use pacconv_transpose2d (in conjunction with packernel2d) for its functional interface.

PacPool2d

PacPool2d is the PAC counterpart of nn.AvgPool2d. It accepts most standard nn.AvgPool2d arguments (including kernel_size, stride, padding, dilation, but not ceil_mode and count_include_pad), and we make sure that when the same arguments are used, PacPool2d and nn.AvgPool2d have the exact same output sizes. A few additional optional arguments are available: , and also a few additional ones:

    Args:
        kernel_size, stride, padding, dilation
        kernel_type (str): 'gaussian' | 'inv_{alpha}_{lambda}[_asym][_fixed]'. Default: 'gaussian'
        smooth_kernel_type (str): 'none' | 'gaussian' | 'average_{sz}' | 'full_{sz}'. Default: 'none'
        channel_wise (bool): Default: False
        normalize_kernel (bool): Default: False
        out_channels (int): needs to be specified for channel_wise 'inv_*' (non-fixed) kernels. Default: -1

    Note:
        - kernel_size only accepts odd numbers
        - padding should not be larger than :math:`dilation * (kernel_size - 1) / 2`

Similar to PacConv2d, PacPool2d also offers two ways of usage:

in_ch, g_ch = 16, 8                     # channel sizes of input and guidance
stride, f, b, h, w = 5, 2, 64, 64       # stride, filter size, batch size, input height and width
input = torch.rand(b, in_ch, h, w)
guide = torch.rand(b, g_ch, h, w)       # guidance feature 

pool = nn.AvgPool2d(f, stride)
out_pool = pool(input)                  # standard spatial convolution

pacpool = PacPool2d(f, stride)   
out_pac = pacpool(input, guide)         # PAC 
out_pac = pacpool(input, None, guide_k) # alternative interface
                                        # guide_k is pre-computed 'K'
                                        # of shape [b, g_ch, f, f, h, w]. packernel2d can be 
                                        # used for its creation.  

Use pacpool2d (in conjunction with packernel2d) for its functional interface.

PacCRF and PacCRFLoose

These layers offer a convenient way to add a CRF component at the end of a dense prediction network. They performs approximate mean-field inference under the hood. Available arguments include:

    Args:
        channels (int): number of categories.
        num_steps (int): number of mean-field update steps.
        final_output (str): 'log_softmax' | 'softmax' | 'log_Q'. Default: 'log_Q'
        perturbed_init (bool): whether to perturb initialization. Default: True
        native_impl (bool): Default: False
        fixed_weighting (bool): whether to use fixed weighting for unary/pairwise terms. Default: False
        unary_weight (float): Default: 1.0
        pairwise_kernels (dict or list): pairwise kernels, see add_pairwise_kernel() for details. Default: None

Usage example:

# create a CRF layer for 21 classes using 5 mean-field steps
crf = PacCRF(21, num_steps=5, unary_weight=1.0)

# add a pariwise term with equal weight with the unary term
crf.add_pairwise_kernel(kernel_size=5, dilation=1, blur=1, compat_type='4d', pairwise_weight=1.0)

# a convenient function is provided for creating pairwise features based on pixel color and positions
edge_features = [paccrf.create_YXRGB(im, yx_scale=100.0, rgb_scale=30.0)] 
output = crf(unary, edge_features)

# Note that we use constant values for unary_weight, pairwise_weight, yx_scale, rgb_scale, but they can 
# also take tensors and be learned through backprop.

Experiments

Joint upsampling

Joint depth upsampling on NYU Depth V2
Joint optical flow upsampling on Sintel

Semantic segmentation

See python -m task_semanticSegmentation.main -h for the complete list of command line options.

Citation

If you use this code for your research, please consider citing our paper:

@inproceedings{su2019pixel,
  author    = {Hang Su and 
           Varun Jampani and 
           Deqing Sun and 
           Orazio Gallo and 
           Erik Learned-Miller and 
           Jan Kautz},
  title     = {Pixel-Adaptive Convolutional Neural Networks},
  booktitle = {Proceedings of the IEEE Conference on Computer 
               Vision and Pattern Recognition (CVPR)},
  year      = {2019}
}