ST layer scales down input and produces wrong output during random iterations

daerduoCarey / SpatialTransformerLayer

Other

181 stars 97 forks source link

ST layer scales down input and produces wrong output during random iterations #14

Open swarnakr opened 7 years ago

swarnakr commented 7 years ago

Hi, (1) I found that the ST layer always scales down input such that it looks like a transformed image on a black canvas. (2) When the batch size is large (>300) the transformed output contains negative values in alternate iterations. I found this extremely bizarre.

Any help with these issues will help! I'm running on a deadline so a quick response would also be much appreciated.

daerduoCarey commented 7 years ago

Hi, @swarnakr,

For (1), you could add additional loss to make the scaling factor in the transformation matrix big. It's possible that ST layer will generate black padding around the images. My experiment shows that this will not affect the final classification on MNIST and CUB dataset. For (2), sorry that I have no idea. It is irregular to have batch size larger than 64 to processing images I think. Never encounter this problem. I think there should be no negative values if your inputs are all non-negative since ST layer is just doing interpolation.

Bests, Kaichun

swarnakr commented 7 years ago

Thanks for your response.

Following up on (2), I find that almost all the affine parameters predicted by the localisation network results in very bizarre transformations (like negative x,y coordinates).

How do you make sure that the affine coordinates are proper? I'm not sure that there is a simple loss function that can ensure this.

Also, in the original paper by Zisserman et al, they don't mention using any loss on thetas, so do you have any idea what the difference in their implementation might be?

Many thanks, Swarna

daerduoCarey commented 7 years ago

Hi, @swarnakr

Thank you for your interest in my code. I'm actually not sure about how the original authors produce their results. But I guess the following rules should help: 1) Make the learning rate for the localization network smaller than the regular. Make it 1e-2 ~ 1e-3 times smaller will help the prediction to change slowly. 2) You may add some penalty loss for the magnitude of the transformation to make the transformation small and smooth.

Thanks. Kaichun