@yamins81 This solution seems to work. I tried something more general at first, but this seems to work without requiring too much modificatino of the interface. I had to change a couple of things:
passing the model as an argument to the weight and bias initialization functions so that I can cache stored models and not have to reload them for every layer
allow for weight and bias intialization functions to return a tuple: first value is used to intialize, second is used to initialize the increment.
@yamins81 This solution seems to work. I tried something more general at first, but this seems to work without requiring too much modificatino of the interface. I had to change a couple of things: