OpenMOSS / Language-Model-SAEs

For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
32 stars 6 forks source link

fix(evals): performs logits mask when computing ce score. Ignoring pa… #25

Closed Hzfinfdu closed 3 months ago

Hzfinfdu commented 3 months ago

…d tokens