AlignmentResearch / tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer
https://tuned-lens.readthedocs.io/en/latest/
MIT License
432 stars 47 forks source link

A complete refactor of the plotting code #63

Closed levmckinney closed 1 year ago

levmckinney commented 1 year ago

The plotting code is now broken down into a set of classes:

This refactors also adds a notebook that demonstrates the new functionality prediction_trajectory.ipynb in the tutorials section of the docs.

UlisseMini commented 1 year ago

Haven't read everything super deeply and I'm not familiar with the codebase, but this looks syntactically wrong:

        js_div = 0.5 * np.sum(
            self.probs * (self.log_probs - other.log_probs), axis=-1
        ) + 0.5 * np.sum(self.probs * (self.log_probs - self.log_probs), axis=-1) # always zero

Should be (other.log_probs - self.log_probs).

Other then that looks good! I like this refactor. Will make using HookedTransformer from TransformerLens much easier :-)

(Random meta point this made me realize: Syntactic error like this are things an AI system could catch, I think there's a service that does this...)

levmckinney commented 1 year ago

@UlisseMini thank you and good catch!

UlisseMini commented 1 year ago

It would probably be good to have a from_lens_and_hidden method. It seems the only time the model is used in from_lens_and_model is in the first three lines. Main workarounds (for people who's models don't support output_hidden_states) would be adding it as a method or constructing a PredictionTrajectory manually. Thoughts?

levmckinney commented 1 year ago

It would probably be good to have a from_lens_and_hidden method. It seems the only time the model is used in from_lens_and_model is in the first three lines. Main workarounds (for people who's models don't support output_hidden_states) would be adding it as a method or constructing a PredictionTrajectory manually. Thoughts?

This seems like a good feature. Can you open a PR adding this? You know your requirements better than I do. I'd be happy to review it. I'm hopping to get this PR merged today.