ai-safety-foundation / sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability
https://ai-safety-foundation.github.io/sparse_autoencoder/
MIT License
191 stars 39 forks source link

sum(latents) vs mean(hidden) #184

Open wassname opened 10 months ago

wassname commented 10 months ago

I noticed that in the loss, by default, you sum the latent dimension in the l1 loss, but take the mean of the features dimension (in the reconstruction loss). This seems like it would make the optimal l1_coeffecient vary with the model size. Might be worth considering taking the mean of both those dimensions, that way the optimal l1_coeff should be more stable between autoencoder configurations.