TransformerLensOrg / CircuitsVis

Mechanistic Interpretability Visualizations using React
https://alan-cooney.github.io/CircuitsVis/
MIT License
192 stars 29 forks source link

Topk samples #35

Closed danbraunai closed 1 year ago

danbraunai commented 1 year ago

Vis for showing the samples that contain tokens which maximally activate a particular e.g. neuron or direction:

image

This vis is very similar to TextNeuronActivations, with the difference being that this vis contains a different set of samples for each layer/neuron selection, whereas TextNeuronActivations keeps the samples fixed and shows different activation values for each layer/neuron.

(Note, this PR uses functionality introduced in #21 and #34. All the commits from those PRs are included here, so it would be cleaner to merge those PRs first.)