Implement Num::NN without automatic differentiation

As the Num::NN module expands, especially to support many more complex layer types and network types, keeping the automatic differentiation up to date with each of these layers will make maintenance harder than I think it is worth. I'm already running into this with RNN, GRU, and LSTM layers, where it's much easier to not worry about deriving multiple hidden states. I think it is good to keep Num::Grad, to allow users to build their own networks, but also provide a much more flexible option.

Potentially while doing this, it might be worth it to make Num::Grad::Variable a true wrapper around Tensor, to make syntax more similar.

crystal-data / num.cr

Implement Num::NN without automatic differentiation #57