Add Eli & Rishi's correlation-based learning mechanism

http://arxiv.org/abs/2011.07334 -- the key equation is: (Var_i + Var_j - 2 Covar_ij) -- optimize variance of sender and receiver and minimize covariance between the two.

In my experiments, I updated the SWt (structural, spine, slow) weight in the slower outer-loop cycle as a function of accumulated Var and Covar stats (computed using simple running-average act - mean values) -- this produces a graded form of pruning-like function, because SWt multiplies the regular "fast" learned weights, so when it is reduced toward 0, it produces an effective "soft" form of pruning.

Having worked through the logic here better, I realized that I had an error in the initial implementation: missed the factor of 2 on Covar_ij and also that the pruning logic would make more sense to only include the negative component of this value -- otherwise we're getting a hebbian-like variance increasing force that is constantly working to increase the weights. That is not present in the pruning version.

emer / axon

Add Eli & Rishi's correlation-based learning mechanism #17