Closed dtch1997 closed 1 week ago
Currently we implicitly assume one node per hook point; however this is not the case.
n_heads
blocks.{layer}.attn.hook_result
Proposed implementation:
SrcNode.get_act
(... d_model)
Example code:
# Compute per-hook activations, gradients. acts_per_hook, grads_per_hook = compute_activations_and_gradients_simple( model, handler ) # Convert this to per-node acts, grads. graph_acts, graph_grads = ... scores = compute_attribution_scores( graph_acts, graph_grads, model.cfg, aggregation=aggregation )
Closed in ae8b6bead66a477bc86f7693835a8042d8df502a
Currently we implicitly assume one node per hook point; however this is not the case.
n_heads
nodesblocks.{layer}.attn.hook_result
Proposed implementation:
SrcNode.get_act
method, which accepts the tensor of its input hook point and returns a tensor of shape(... d_model)
Example code: