Closed DavidUdell closed 10 months ago
When I train the autoencoders on openwebtext
rather than the much skimpier truthful_qa
, there are then almost no unused autoencoder dimensions at an autoencoder 10 times later than the hidden dimension. This doesn't mean no dead neurons are present, but does mean that the truthful_qa
autoencoder training was on quite a small dataset relative to trainable parameters.
I will still verify dead neuron count, in both cases.
At least stdout a dead neuron count from train_autoencoder
.
We know from Anthropic's paper that sparse autoencoders will, in practice, not utilize all the neurons made available to them. Autoencoder neurons that cannot activate are called "dead neurons." Detecting any dead neurons during autoencoder training and automatically perturbing/further training them in Anthropic's fashion should improve autoencoder expressiveness.