With the exception of the toy RASP models, all models run against the current interpretability pipeline generate a set of top-k input token labels per neuron. feature_web should label its output .png neurons with these labels, except for when the model is the RASP model.
With the exception of the toy RASP models, all models run against the current interpretability pipeline generate a set of top-k input token labels per neuron.
feature_web
should label its output.png
neurons with these labels, except for when the model is the RASP model.