Open korsbo opened 2 years ago
I think I can add a PerLayer penalty that lets you pass a tuple of penalties. As well as a NonBiasPenalty, that doesn't get applied to bias.
As well as a NonBiasPenalty, that doesn't get applied to bias.
Would it be better to have some penalty wrappers like you have with FrontLastPenalty
or would it be more natural to just let the bias regularisation toggling be a type parameter of L1Penalty and L2Penalty?
Sometimes, it can be hard to know the input data's scale, so it might be hard to standardise them (like in a UDE). It might then make sense to let the parameters of the first layer be unregularised or weakly regularised such that they can better compensate for differences in scale between the inputs. Something like
FrontMiddleLastPenalty
, although that's getting a bit verbose.