Open oliveradk opened 1 month ago
Eh nmv, I guess its just a matter of what you want to "normalize" against, and normalizing against edges seems appropriate.
Integrated Gradients not faithful to original formulation
This implementation predates the Sparse Feature Circuits paper by several months, so it was not supposed to be an implementation of their method.
However I agree with their reasoning that it often makes more sense not do for different layer simultaneously. I'd be happy for you or someone else to reopen this issue. I do not plan to implement it myself in the immediate future.
The current implementation of integrated gradients interpolates on the entire patch mask. However, this introduces dependencies between upstream nodes and downstream nodes. Let f_k(x) denote the output of a component k, and f_k_alpha(x) denote the output of a component with previous component outputs interpolated with alpha. We want to be substituting f_k(x) for alpha f_k(x) + (1-alpha) f_k(x), but instead we're substituting alpha * f_k_alpha(x) + (1-alpha) f_k_alpha(x). Sparse Feature Circuits addresses this in section 2:
I don't think the edge case is different, and thus think the current implementation in
prune_algos.mask_gradient
is incorrect, and should be adjusted to compute scores iteratively over source node layers.(This would change the time complexity from O(forward N) to O(forward n_layers * N), so maybe just add as an optional setting)