Initialization does not take weights into account (unless there is normalization). Need to think of a general way to account for weights in the initialization.
Maybe scale values by square root of weight? That would be similar to what is being done with normalization, I think.
Initialization does not take weights into account (unless there is normalization). Need to think of a general way to account for weights in the initialization.
Maybe scale values by square root of weight? That would be similar to what is being done with normalization, I think.