Write a linear approximations script

DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding

MIT License

7 stars 1 forks source link

Closed DavidUdell closed 2 months ago

DavidUdell commented 3 months ago

Remaining todos here are:

implement credit assignment to error tensors,
clean up old gradient detaching code, which isn't used in this implementation, and
write code taking in sublayer sparse autoencoders and outputting intra-block causal graphs.

DavidUdell commented 3 months ago

Node level stuff is done. All that remains for the gradient-based implementation is the edge-level stuff.

DavidUdell commented 2 months ago

Edge level stuff done, excluding MLP-out and attn-out.