Feature Request: Equivariant Downsampling and Upsampling

drewm1980 commented 4 years ago

I would like to build rotation equivariant networks in the UNet family, but it seems downsampling and upsampling are missing necessary building blocks.

For continuous signals it seems pretty obvious that downsampling and upsampling operators can be defined that are equivariant. i.e. if an image is downscaled by 2, then rotations happen by the same angle, translations happen in the same direction but half as far. Similar story for upscaling but translations are doubled. Both operators are lossless in an idealized infinite continuous 2d plane setting.

For discrete signals it should be possible too up to numerics, although you have to be careful to use downsampling and upsampling kernels that are (approximately) rotation symmetric, aka "radial".

Pytorch already has down and upsampling operators; maybe it's just a matter of wrapping them correctly in your framework.

Gabri95 commented 4 years ago

Hi @drewm1980

What you say is right. I think you are looking for these:

Downsampling: https://quva-lab.github.io/e2cnn/api/e2cnn.nn.html#pooling
Upsampling: https://quva-lab.github.io/e2cnn/api/e2cnn.nn.html#upsampling

Upsampling supports different options though, in practice, we found bilinear interpolation to work better. We implemented different downsampling algorithms, as operations like max-pooling are not always compatible with any type of representation. In case you use scalar fields, regular fields or quotient fields, you can use (pointwise) max-pooling, which acts as usual in Pytorch. Average pooling is always compatible with any representation. To get better stability, I would suggest using the antialiased versions of the downsampling methods.

Is this what you were looking for?

Best, Gabriele

drewm1980 commented 4 years ago

Hi Gabriele,

Thanks for the response. I read more of the code... R2Conv seems like it is closely related, though it seems to use trainable weights, instead of a fixed anti-alias filter.

My first forays with this will indeed be with scalar fields, although I already have applications in mind for vector fields (as outputs).

I would like to only use equivariant operators unless there is a good reason not to. Equivariance is why I'm here after all :) My understanding is that max pooling and average pooling (presumably over rectangular regions) breaks equivariance, but I will look further into the antialiased versions of the operators you mention.

Do you know of published networks doing down and up sampling based on this toolbox? I would not at all be surprised if someone got to this before me.

drewm1980 commented 4 years ago

PointwiseAvgPoolAntialiased looks like the go-to operator for correct (antialiased) downsampling. "Antialiased channel-wise average-pooling: each channel is treated independently. It performs strided convolution with a Gaussian blur filter." I am curious why you compute a gaussian blur kernel rather than using the "Tri-3" or "Bin-5" filters from the "Making convolutions shift invariant again" paper. Is it closer to being a radial function at small filter sizes since it's not constrained to integers by construction? It's certainly more generic. To do 2X downsampling (stride=2), is sigma=1 a good starting point? That results in a 7x7 kernel.

Gabri95 commented 4 years ago

Hi @drewm1980

Indeed, I think antialiased average pooling is what you are looking for. In many cases, though, it seems like using a stride > 1 in your previous convolutional layer is already good enough since the learnable convolutional filters are already quite smooth (we use a band-limited basis). This, of course, depends on the specific case and on how perfectly equivariant you want the model to be.

In the anti-aliased pooling, we use Gaussian filters for simplicity since they are perfectly rotation invariant (analytically, i.e. in the continuous domain). I guess that after discretization, there is little difference between using their filters and a Gaussian blur, though we did not experiment with them.

Regarding the downsampling, it does depend on you task. Using larger filters (larger sigma) is, of course, giving more stable results but will also result in less expressive networks (you will smooth your features too much and drop too much high-frequency information). I think you should experiment with different sizes. I often found 5x5 filters (so sigma = 2/3) to be good enough to give acceptable results. If you do not care about numerically testing the equivariance of your model but only aim for higher performance, I think you can often just use strided convolution (as suggested above), as it requires fewer computations.

Hope this answers your question!

Best, Gabriele

drewm1980 commented 4 years ago

Thanks Gabriele, it does!

RE the question of whether we're numerically testing the equivariance of our model, I have an anecdote for you... we once had a customer rotate the object we were analyzing and report the variation in the output of our algorithm as a bug. They didn't have a way of testing if our numbers were actually correct but it was really easy for them to test if our algorithm was rotation invariant!

page200 commented 3 years ago

Some info from this thread might be missing in the documentation of the pooling layers. In their documentation, it seems not obvious what kind of equivariance they have. Or is that implied by something?

Gabri95 commented 3 years ago

Hi @page200

Are you referring my first message? In particular:

We implemented different downsampling algorithms, as operations like max-pooling are not always compatible with any type of representation. In case you use scalar fields, regular fields or quotient fields, you can use (pointwise) max-pooling, which acts as usual in Pytorch. Average pooling is always compatible with any representation.

The documentation of PointwiseMaxPooling mentions this:

Notice that not all representations support this kind of pooling. In general, only representations which support pointwise non-linearities do.

If you refer to the comments on anti-aliasing, I agree these are not really discussed enough in the docs. I will update it with some additional notes about it, thanks for pointing this out!

Best, Gabriele

page200 commented 3 years ago

On one hand yes, about anti-aliasing. On the other hand, the docstrings don't make it obvious which layers have what kind of equivariance. Maybe instead of "max-pooling" in the first sentence of each layer's description you could write something like "G-equivariant max-pooling, where G is ...". Thanks!

Gabri95 commented 3 years ago

In that sense, max pooling is supposed to be equivariant to any group G. Of course, this is in practice not true since max pooling breaks equivariance to continuous rotations. Indeed, max pooling can be perfectly equivariant only to 90 degrees rotations and reflections (like all operations in the library) since these are the only perfect symmetries of the grid. Is this what you meant?

Best, Gabriele

page200 commented 3 years ago

I meant that, and I meant another thing: The docstring of the layer doesn't state yet whether the layer is equivariant. And if it is equivariant under some group G, which input variable contains the info (in what format) regarding what the current G is? The docstring should start with something like "G-equivariant max-pooling, where G is given by ...".

QUVA-Lab / e2cnn

Feature Request: Equivariant Downsampling and Upsampling #7