Spatial Transformer example - locnet activation functions

EderSantana / seya

Bringing up some extra Cosmo to Keras.

Other

377 stars 103 forks source link

Spatial Transformer example - locnet activation functions #46

Open 9thDimension opened 7 years ago

9thDimension commented 7 years ago

See In [5] of https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb where the localisation network is defined.

Is there a reason why the Convolution2D layers have no activations? And the final layer (responsible for regressing the affine transformation parameters) does have a 'relu' activation. I may be wrong, but I thought that it's typical for the final layer of a neural net regression to have linear activation.

I asked some others about this, but nobody could explain why the activations are laid out this way, and they suggested I raise it here -- so hopefully the author can comment on these design choices.

EderSantana commented 7 years ago

STNs are really unstable. The gradients we get there are just not well behaved. I think that avoiding as much nonlinearities as possible was just the way the original authors of the paper found to get it to behave well.

So yeah, STNs are mostly not well understood yet. Maybe STNs are harder to train than GANs. The research community should be paying more attention to that soon. But now, when designing your STNs I'd recommend using the hyperparameters in the original paper and its follow ups.

9thDimension commented 7 years ago

Fair enough. I have found in other unrelated experiments that nets with linear activations can perform useful tasks and are fast to optimize.

So in that case, why do you have the 'relu' activation before the locnet's final regression layer? I can't find where they suggest such an idea in the original paper.

P.S. Thanks and much respect for bringing these cutting-edge concepts to Keras in your Seya library.

EderSantana commented 7 years ago

I think the linear convs are just to quickly "find" the region of interests. But in some other experiments I'm doing, you can't go from the location in the image to parameters in the spatial transformer matrix with just one layer. You need at least one hidden layer even for simple experiments. That is what I found experimentally at least