Measure l2 norm change in winnery experiment

Current results hint that high robustness is correlated with high winnery intersetion. While this is contrary to the notion that robust layers are 'unimportant' and winnery tickets are 'important', @galshachaf suggests this can be explained by measuring the l2 norm of robust layers. If a layer's weights didn't change much during training (expected of robust layers) (can be measured grossly by L2/L_infty combo) then not many layers could have dipped below the pruning threshold; therefore the trained layer should have about the same amount of below-threshold edges as the untrained layer. If the threshold is, say, 0.07, then (normalized weights) about 7% of edges would have been pruned at the start and not much more than say 10% at the end. Thus, for robust layers, most edges won't be pruned and the winnery intersection should be large.

To test this add a line to the graph measuring L2 norm change between epochs start/end.

dorimedini / robustness_properties

Measure l2 norm change in winnery experiment #89