DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding
MIT License
7 stars 1 forks source link

Write a linear approximations script #95

Closed DavidUdell closed 2 months ago

DavidUdell commented 3 months ago

Remaining todos here are:

  1. implement credit assignment to error tensors,
  2. clean up old gradient detaching code, which isn't used in this implementation, and
  3. write code taking in sublayer sparse autoencoders and outputting intra-block causal graphs.
DavidUdell commented 3 months ago

Node level stuff is done. All that remains for the gradient-based implementation is the edge-level stuff.

DavidUdell commented 2 months ago

Edge level stuff done, excluding MLP-out and attn-out.