Implement init well - Githubissues

From #72 : init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio of 1:10 of positives: negatives, set the bias on your logits such that your network predicts a probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iterations your network is basically just learning the bias.

this should be model modifier - or even some necessary function to implement

SebChw / Actually-Robust-Training

Implement init well #194