Current results hint that high robustness is correlated with high winnery intersetion. While this is contrary to the notion that robust layers are 'unimportant' and winnery tickets are 'important', @galshachaf suggests this can be explained by measuring the l2 norm of robust layers.
If a layer's weights didn't change much during training (expected of robust layers) (can be measured grossly by L2/L_infty combo) then not many layers could have dipped below the pruning threshold; therefore the trained layer should have about the same amount of below-threshold edges as the untrained layer. If the threshold is, say, 0.07, then (normalized weights) about 7% of edges would have been pruned at the start and not much more than say 10% at the end. Thus, for robust layers, most edges won't be pruned and the winnery intersection should be large.
To test this add a line to the graph measuring L2 norm change between epochs start/end.
Current results hint that high robustness is correlated with high winnery intersetion. While this is contrary to the notion that robust layers are 'unimportant' and winnery tickets are 'important', @galshachaf suggests this can be explained by measuring the l2 norm of robust layers. If a layer's weights didn't change much during training (expected of robust layers) (can be measured grossly by L2/L_infty combo) then not many layers could have dipped below the pruning threshold; therefore the trained layer should have about the same amount of below-threshold edges as the untrained layer. If the threshold is, say, 0.07, then (normalized weights) about 7% of edges would have been pruned at the start and not much more than say 10% at the end. Thus, for robust layers, most edges won't be pruned and the winnery intersection should be large.
To test this add a line to the graph measuring L2 norm change between epochs start/end.