nan gradients with BCHW layout

I recently tried to use the new BCHW functions with my network, since i always use that layout and it simplifies my code.

I noticed that all the gradients of my convolutional layers are nan now, which makes also the weights full of nan after the parameters' update.

I'm sure i made all the necessary conversions and i get no error like inconsistency between tensors or anything else.

fxia22 / stn.pytorch