sum(latents) vs mean(hidden)

ai-safety-foundation / sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

https://ai-safety-foundation.github.io/sparse_autoencoder/

MIT License

171 stars 39 forks source link

sum(latents) vs mean(hidden) #184

Open wassname opened 7 months ago

wassname commented 7 months ago

I noticed that in the loss, by default, you sum the latent dimension in the l1 loss, but take the mean of the features dimension (in the reconstruction loss). This seems like it would make the optimal l1_coeffecient vary with the model size. Might be worth considering taking the mean of both those dimensions, that way the optimal l1_coeff should be more stable between autoencoder configurations.