DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding
MIT License
7 stars 1 forks source link

Deal with autoencoder training "dead neurons" #2

Closed DavidUdell closed 10 months ago

DavidUdell commented 11 months ago

We know from Anthropic's paper that sparse autoencoders will, in practice, not utilize all the neurons made available to them. Autoencoder neurons that cannot activate are called "dead neurons." Detecting any dead neurons during autoencoder training and automatically perturbing/further training them in Anthropic's fashion should improve autoencoder expressiveness.

DavidUdell commented 11 months ago

When I train the autoencoders on openwebtext rather than the much skimpier truthful_qa, there are then almost no unused autoencoder dimensions at an autoencoder 10 times later than the hidden dimension. This doesn't mean no dead neurons are present, but does mean that the truthful_qa autoencoder training was on quite a small dataset relative to trainable parameters.

I will still verify dead neuron count, in both cases.

DavidUdell commented 10 months ago

At least stdout a dead neuron count from train_autoencoder.