TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.62k stars 309 forks source link

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

Open Steven-Yiran opened 3 weeks ago

Steven-Yiran commented 3 weeks ago

Question

I am looking to adopt TransformerLens on a custom model currently not supported the TransformerLens library. The custom model have the same GPT-2 like architecture except the implementation of the LayerNorm operation. Specifically, for each layer it implements a LayerNorm (with weight and bias) at the end of the mlp output. I looked into the Othello GPT but am still not sure about how to avoid the architecture mismatch.

Would it still be possible to run analysis on the custom model with TransformerLens? Thanks!

bryce13950 commented 2 weeks ago

Today this is not possible without making modifications to the code itself. Making this possible is something that is tentatively on the plans for what will be 4.0. For the time being, I can setup a little hook for you to override the layer norm, but it would be an experimental branch, and we would probably have to work relatively closely together to make sure it is working for you. The model you are trying to test is most similar to GPT-2 right?

Steven-Yiran commented 2 weeks ago

Thanks for your response! Specifically, I am trying to run experiments on BioGPT. In terms of architecture, the only layer norm occurs after the mlp modules (final_layer_norm in the screenshot below). The implementation of attention and mlp modules are the same with GPT-2.

layer norm

I would really love to work with you on this if you think this is something that falls on the general roadmap!