AlignmentResearch / tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer
https://tuned-lens.readthedocs.io/en/latest/
MIT License
406 stars 42 forks source link

Prediction depth #72

Open levmckinney opened 1 year ago

levmckinney commented 1 year ago

In the paper there is a nice visualization of prediction depth. Prediction depth is defined in the paper is the first layer where the most likely token is equal to the token output.

prediction depth

These should be included as part of the PredictionTrajectory class so that we can easily produce them in the future. Note that the code for this should be modular like TrajectoryStatistic since we may want to reuse these visualizations for attention in the future.

levmckinney commented 1 year ago

I appear to have lost my prototype code for doing this. But I've dug up the reference implementation I based it off of in cap tum. https://github.com/pytorch/captum/blob/50f7bdd243b0430ef06958bb2dda9b3bdd0c150d/captum/attr/_utils/visualization.py#L755

levmckinney commented 1 year ago

Another reference to look at would be the attention visualizations from anthropic: https://github.com/anthropics/PySvelte. Its used here https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.