Closed dtch1997 closed 1 week ago
These updates to sae_eap
introduce functionality for calculating attribution scores in neural networks using integrated gradients. Key additions include methods to compute activations and gradients, cache values during forward and backward passes, and compute attribution scores. Changes also include indexing nodes in graph structures and computing positional encodings and input lengths within the model.
File | Summary of Changes |
---|---|
sae_eap/attribute.py |
Added methods for computing activations, gradients, and attribution scores. Introduced hooks to cache values. |
sae_eap/graph/index.py |
Introduced GraphIndexer class with methods to build indices for node outputs and inputs, and retrieve these indices. |
sae_eap/utils.py |
Imported HookedTransformer and added get_npos_and_input_lengths function to compute positional encodings and lengths. |
sequenceDiagram
participant User
participant Model
participant Graph
participant Cache
User ->> Model: call attribute(...)
activate Model
Model ->> Graph: build graph and index nodes
activate Graph
Graph ->> Model: provide nodes and indices
deactivate Graph
Model ->> Cache: initialize cache tensors
activate Cache
Cache -->> Model: return cached tensors
deactivate Cache
Model ->> Model: compute activations and gradients
Model ->> Model: compute attribution scores using integrated gradients
deactivate Model
Model -->> User: return attribution scores
Amid the bytes, the code does dance,
With gradients flowing in a graceful prance.
Nodes indexed, graphs aligned,
Through tensors cached, connections bind.
A rabbit's work, algorithms entwine,
Attribution scores illuminate, divine.
🐰✨ Debugging’s charm, oh so fine!
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Summary by CodeRabbit
New Features
HookedTransformer
models to compute position and input lengths.Improvements
Documentation