As the Num::NN module expands, especially to support many more complex layer types and network types, keeping the automatic differentiation up to date with each of these layers will make maintenance harder than I think it is worth. I'm already running into this with RNN, GRU, and LSTM layers, where it's much easier to not worry about deriving multiple hidden states. I think it is good to keep Num::Grad, to allow users to build their own networks, but also provide a much more flexible option.
Potentially while doing this, it might be worth it to make Num::Grad::Variable a true wrapper around Tensor, to make syntax more similar.
As the Num::NN module expands, especially to support many more complex layer types and network types, keeping the automatic differentiation up to date with each of these layers will make maintenance harder than I think it is worth. I'm already running into this with RNN, GRU, and LSTM layers, where it's much easier to not worry about deriving multiple hidden states. I think it is good to keep Num::Grad, to allow users to build their own networks, but also provide a much more flexible option.
Potentially while doing this, it might be worth it to make
Num::Grad::Variable
a true wrapper aroundTensor
, to make syntax more similar.