DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding
MIT License
6 stars 1 forks source link

Write a linear approximations script #95

Closed DavidUdell closed 4 weeks ago

DavidUdell commented 1 month ago

Remaining todos here are:

  1. implement credit assignment to error tensors,
  2. clean up old gradient detaching code, which isn't used in this implementation, and
  3. write code taking in sublayer sparse autoencoders and outputting intra-block causal graphs.
DavidUdell commented 1 month ago

Node level stuff is done. All that remains for the gradient-based implementation is the edge-level stuff.

DavidUdell commented 4 weeks ago

Edge level stuff done, excluding MLP-out and attn-out.