Closed kinalmehta closed 3 years ago
Hi @kinalmehta
Thank for your question.
The behaviour is not totally unexpected. Unfortunately, using dilated filters is not super trivial when using steerable filters, since dilated filters can produce angular aliasing issues (due to their sparsity).
As a short answer, you can try to pass the argument frequencies_cutoff=1.
when you use diltation=2
to obtain a similar number of parameters.
Here is the long answer:
First of all, recall what a basis for the steerable filters looks like. See Figure 2 in https://arxiv.org/pdf/1711.07289.pdf
The equivariant property only constraints the angular part of the filters but not the radial one. Therefore, we split the radial part in a number of independent rings. In a normal (dense) filter, larger rings are sampled on a larger numbers of cells of the filter. This allows one to also consider higher frequencies for the angular component of the largest rings.
The perfect trade off for the number of frequencies to use in each ring is hard to estimate theoretically. What we did, instead, was to manually search for combinations which were containing sufficiently high frequencies while not introducing too much aliasing.
The default parameters in R2Conv use our manually tuned trade-off, which works quite well for dense filters, but is not tuned for sparse filters like dilated ones. This means that, if you are using a 3x3 dilated filter with dilation 2, it corresponds to a 5x5 filter and you will sample high frequencies as if your 5x5 filter is dense.
This is the reason why dilated filters have more parameters.
I would actually recommend trying to use a stronger frequency cut-off when using dilated filters.
You can tune this with the parameter frequencies_cutoff
in R2Conv.
Have a look at this answer, where I gave an a bit more detailed explaination of the parameters you need to tune: https://github.com/QUVA-Lab/e2cnn/issues/18#issuecomment-709980945 It is interesting for you from the sentence "A steerable filter is in general split in multiple rings,.....".
In your case, when using diltation, the set of rings is computed as if the filter is not dilated and then the filters are scaled by the dilation; see this line.
For instance, a 3x3 filter with dilation 2 has two rings at radii 0.
and 2.
(the center of the filter and a ring that passes through the cell in position (2, 2) of the 5x5 grid).
The default policy associates a maximum frequency of 0 at radius 0 and 3 at radius 2; see this line (where r
is the radius and you can ignore the max_radius in this case).
The idea of this "policy" is that on radius r
you can generally sample frequencies up to 2*r
(with some correction for the largest rings since they can partially fall outside the grid), but it assumes dense filters such that larger rings are sampled on more cells.
I would recomment to use at most frequency 2
for the outer ring of radius 2
. This should also give you the same number of parameters of the dense 3x3 filter.
You can do so by passing the argument frequencies_cutoff=1.
, which is interpreted as allowing max frequency 1. * r = r
at radius r
.
Does this make sense for you?
Gabriele
Hi @Gabri95 ,
Thanks for such a detailed answer. The solution worked.
My steerable Convolution concepts are a bit week, but referring to the answer gave me a decent overview of why there are different number of parameters in the two case.
I am using dilated convolution during evaluation and training the model using (max-pool+non-dilated) version. Do you think this will adversely effect the prediction?
Thanks again Kinal
hi @kinalmehta
I am happy it was useful :)
This is hard to tell a priori. Using pooling (especially max-pooling) in general introduces aliasing issues whcih break equivariance (even translation equivariance). Still, in Deep learning we usually use deep networks with max-pooling and find great results; so I don't expect any significant additional adversely effect with respect to a conventional CNN. Actually, the fact the steerable filters are bandlimited and rather smooth should help and make downsampling rather stable.
However, you will probably observe some more noise when checking explicitly the rotation equivariance of the model.
In any case, you can always try to experiment a bit with different bandlimiting of the filters to find a better trade-off for the smoothness of the filters (which reduces the equivariance error).
If you find some interesting result, I'd be curious to hear about it so, please, let me know :)
Best, Gabriele
I want to use R2Conv in atrous spatial pyramid pooling(ASPP), whose dilation is 12, 24, 36. But the result is very very pool. The code is as follows. Could you give some suggestions?
conv3x3(in_dim, reduction_dim, dilation=r, padding=r)(r=12, 24, 36) def conv3x3(inplanes, out_planes, stride=1, padding=1, groups=1, dilation=1): """3x3 convolution with padding""" in_type = FIELD_TYPE['regular'](gspace, inplanes) out_type = FIELD_TYPE['regular'](gspace, out_planes) return enn.R2Conv(in_type, out_type, 3, stride=stride, padding=padding, groups=groups, bias=False, dilation=dilation, sigma=None, frequencies_cutoff=lambda r: 3 * r, initialize=False)
Hi @purse1996
I think you may have some issue with the frequencies cutoff.
If you use a 3x3 filter with dilation D, the outer pixels will have radius D.
You frequency cutoff policy allows frequencies up to 3*D to be sampled there. However, such dilated filter is very sparse.
In particular, the orbit of a pixel will be sampled at most on 4 locations, so I'd recommend not using frequencies higher than 2.
You could use frequencies_cutoff = lambda r: min(r, 2)
such that
However, keep in mind you filter is still very sparse, which also means that it will most likely not be very stable to continuous rotations (but should still be equivariant to 90 deg ones). Does this help?
Gabriele
I happened to observe that when I change the dilation value, the number of parameters change. This is not the case with the standard torch.nn.Conv2D module. Is there any specific reason it happens in e2cnn.nn.R2Conv.
If this behaviour is expected, can you please direct me to the right resource.
Environment:
Code to reproduce issue
Output