Support for CuDNN Softmax forward and backward ops

Implementing the Softmax's forward function is straightforward in CudArray but the backprop suffers performance wise as it requires multiple kernel launches. The resulting formula for the backprop also tends to be numerically unstable. Haveing the softmax's forward and backward available in the cudnn module would be a massive help for neural networks where the softmax is extensively used, especially in the case where the loss function attached to the softmax in not the cross entropy loss ( which avoids the calculation of the jacobian )

andersbll / cudarray

Support for CuDNN Softmax forward and backward ops #51