Number of parameter changing when changing dilation

kinalmehta commented 3 years ago

I happened to observe that when I change the dilation value, the number of parameters change. This is not the case with the standard torch.nn.Conv2D module. Is there any specific reason it happens in e2cnn.nn.R2Conv.

If this behaviour is expected, can you please direct me to the right resource.

Environment:

Python=3.7.9
torch=1.7.1
e2cnn=0.1.5

Code to reproduce issue


import torch
import torch.nn as nn
import torch.nn.functional as F

import numpy as np
import math

import e2cnn
import e2cnn.nn as enn
from e2cnn.nn import init
from e2cnn import gspaces   

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        N = 8
        self.gspace = gspaces.Rot2dOnR2(N)
        self.in_type = enn.FieldType(self.gspace, [self.gspace.trivial_repr] * 3)
        self.out_type = enn.FieldType(self.gspace, [self.gspace.regular_repr] * 16)
        self.layer = enn.R2Conv(self.in_type, self.out_type, 3,
                      stride=1,
                      padding=1,
                      dilation=1,
                      bias=True,
                      )
        self.invariant = enn.GroupPooling(self.out_type)
    def forward(self, x):
        x = enn.GeometricTensor(x, self.in_type)
        out = self.layer(x)
        out = self.invariant(out)
        out = out.tensor
        return out

class ModelDilated(nn.Module):
    def __init__(self):
        super(ModelDilated, self).__init__()
        N = 8
        self.gspace = gspaces.Rot2dOnR2(N)
        self.in_type = enn.FieldType(self.gspace, [self.gspace.trivial_repr] * 3)
        self.out_type = enn.FieldType(self.gspace, [self.gspace.regular_repr] * 16)
        self.layer = enn.R2Conv(self.in_type, self.out_type, 3,
                      stride=1,
                      padding=2,
                      dilation=2,
                      bias=True,
                      )
        self.invariant = enn.GroupPooling(self.out_type)
    def forward(self, x):
        x = enn.GeometricTensor(x, self.in_type)
        out = self.layer(x)
        out = self.invariant(out)
        out = out.tensor
        return out

if __name__=="__main__":
    m = Model()
    md = ModelDilated()
    ip = torch.randn(1,3,100,100)
    op1 = m(ip)
    op2 = md(ip)

    totalParams = sum(p.numel() for p in m.parameters())
    totalParams2 = sum(p.numel() for p in md.parameters())
    print(totalParams, totalParams2)
    print(op1.shape, op2.shape)

Output

304 400
torch.Size([1, 16, 100, 100]) torch.Size([1, 16, 100, 100])

Gabri95 commented 3 years ago

Hi @kinalmehta

Thank for your question.

The behaviour is not totally unexpected. Unfortunately, using dilated filters is not super trivial when using steerable filters, since dilated filters can produce angular aliasing issues (due to their sparsity).

As a short answer, you can try to pass the argument frequencies_cutoff=1. when you use diltation=2 to obtain a similar number of parameters.

Here is the long answer:

First of all, recall what a basis for the steerable filters looks like. See Figure 2 in https://arxiv.org/pdf/1711.07289.pdf

The equivariant property only constraints the angular part of the filters but not the radial one. Therefore, we split the radial part in a number of independent rings. In a normal (dense) filter, larger rings are sampled on a larger numbers of cells of the filter. This allows one to also consider higher frequencies for the angular component of the largest rings.

The perfect trade off for the number of frequencies to use in each ring is hard to estimate theoretically. What we did, instead, was to manually search for combinations which were containing sufficiently high frequencies while not introducing too much aliasing.

The default parameters in R2Conv use our manually tuned trade-off, which works quite well for dense filters, but is not tuned for sparse filters like dilated ones. This means that, if you are using a 3x3 dilated filter with dilation 2, it corresponds to a 5x5 filter and you will sample high frequencies as if your 5x5 filter is dense.

This is the reason why dilated filters have more parameters.

I would actually recommend trying to use a stronger frequency cut-off when using dilated filters. You can tune this with the parameter frequencies_cutoff in R2Conv.

Have a look at this answer, where I gave an a bit more detailed explaination of the parameters you need to tune: https://github.com/QUVA-Lab/e2cnn/issues/18#issuecomment-709980945 It is interesting for you from the sentence "A steerable filter is in general split in multiple rings,.....".

In your case, when using diltation, the set of rings is computed as if the filter is not dilated and then the filters are scaled by the dilation; see this line. For instance, a 3x3 filter with dilation 2 has two rings at radii 0. and 2. (the center of the filter and a ring that passes through the cell in position (2, 2) of the 5x5 grid). The default policy associates a maximum frequency of 0 at radius 0 and 3 at radius 2; see this line (where r is the radius and you can ignore the max_radius in this case). The idea of this "policy" is that on radius r you can generally sample frequencies up to 2*r (with some correction for the largest rings since they can partially fall outside the grid), but it assumes dense filters such that larger rings are sampled on more cells. I would recomment to use at most frequency 2 for the outer ring of radius 2. This should also give you the same number of parameters of the dense 3x3 filter. You can do so by passing the argument frequencies_cutoff=1., which is interpreted as allowing max frequency 1. * r = r at radius r.

Does this make sense for you?

Gabriele

kinalmehta commented 3 years ago

Hi @Gabri95 ,

Thanks for such a detailed answer. The solution worked.

My steerable Convolution concepts are a bit week, but referring to the answer gave me a decent overview of why there are different number of parameters in the two case.

I am using dilated convolution during evaluation and training the model using (max-pool+non-dilated) version. Do you think this will adversely effect the prediction?

Thanks again Kinal

Gabri95 commented 3 years ago

hi @kinalmehta

I am happy it was useful :)

This is hard to tell a priori. Using pooling (especially max-pooling) in general introduces aliasing issues whcih break equivariance (even translation equivariance). Still, in Deep learning we usually use deep networks with max-pooling and find great results; so I don't expect any significant additional adversely effect with respect to a conventional CNN. Actually, the fact the steerable filters are bandlimited and rather smooth should help and make downsampling rather stable.

However, you will probably observe some more noise when checking explicitly the rotation equivariance of the model.

In any case, you can always try to experiment a bit with different bandlimiting of the filters to find a better trade-off for the smoothness of the filters (which reduces the equivariance error).

If you find some interesting result, I'd be curious to hear about it so, please, let me know :)

Best, Gabriele

purse1996 commented 3 years ago

I want to use R2Conv in atrous spatial pyramid pooling(ASPP), whose dilation is 12, 24, 36. But the result is very very pool. The code is as follows. Could you give some suggestions?

conv3x3(in_dim, reduction_dim, dilation=r, padding=r)(r=12, 24, 36) def conv3x3(inplanes, out_planes, stride=1, padding=1, groups=1, dilation=1): """3x3 convolution with padding""" in_type = FIELD_TYPE['regular'](gspace, inplanes) out_type = FIELD_TYPE['regular'](gspace, out_planes) return enn.R2Conv(in_type, out_type, 3, stride=stride, padding=padding, groups=groups, bias=False, dilation=dilation, sigma=None, frequencies_cutoff=lambda r: 3 * r, initialize=False)

Gabri95 commented 3 years ago

Hi @purse1996

I think you may have some issue with the frequencies cutoff.

If you use a 3x3 filter with dilation D, the outer pixels will have radius D. You frequency cutoff policy allows frequencies up to 3*D to be sampled there. However, such dilated filter is very sparse. In particular, the orbit of a pixel will be sampled at most on 4 locations, so I'd recommend not using frequencies higher than 2. You could use frequencies_cutoff = lambda r: min(r, 2) such that

in the central pixel you have max frequency = 0
on other pixels you have max frequency = 2

However, keep in mind you filter is still very sparse, which also means that it will most likely not be very stable to continuous rotations (but should still be equivariant to 90 deg ones). Does this help?

Gabriele

QUVA-Lab / e2cnn

Number of parameter changing when changing dilation #25