TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.45k stars 283 forks source link

Encoder-Decoder (T5) support #605

Closed somvy closed 3 months ago

somvy commented 4 months ago

Hey!

I needed EncoderDecoder (T5) models support for my research, but unfortunately neither TransformerLens, nor nnsight had support for them. So i coded this one)

This PR is mostly based on #276 and HuggingFace implementation of T5

Hope that it is enough to get started with encoder-decoder arhitectures, and merge into the lib. I will continue work on this.

Will be glad to receive any comments :)

Loading example

from transformer_lens.HookedEncoderDecoder import  HookedEncoderDecoder

path = "results/t5_scratch_adamw_eos_3lay_8head_512d_2048dff_3e-4_400ep/"

#you can load  local custom model or official T5 weights from HF
path = "t5-small" 

hooked_model = HookedEncoderDecoder.from_pretrained(model_name=path)

Description

Fixes #258

Type of change

Please delete options that are not relevant.

Checklist:

bryce13950 commented 4 months ago

Very good! I will be reviewing this relatively quickly. Before I do that, can you do me a big favor, and add a demo similar to the BERT demo you referenced in the other PR, so that the usage of this can be shown in detail for the future?

somvy commented 4 months ago

Thank you! Feel free to point at any weak spots in the code, im totally okay with it and will fix them as soon as possible)

Added a simple demo of t5 on how to run it with cache, visualise attn patterns and topk token logits

bryce13950 commented 4 months ago

Thanks for the great work! I will hopefully have time to to review the whole thing this week. I have a few bugs that I need to fix, and hopefully nothing comes up after that.

bryce13950 commented 4 months ago

Alright, so now that 2.0 has been put up I have sometime to review this. There is quite a bit changing with standards for code submissions in this project, which mainly revolve around improvements to unit testing, and a couple changing standards. What you have done so far looks pretty good, but given the changing standards I am going to request a few things to make sure we are moving closer towards the new standards. That being said, the first thing I would like to check is to make sure you replace any asserts outside of tests with an if block and then throwing of an error. This is one of those things that isn't an official requirement yet for the code, but it will be very soon, and anything new coming in should be in this new format. The second thing that I am going to ask is that we improve the test coverage a bit. The test you added is good, but I want to get some unit tests in here to test different bits of functionality in isolation a bit more. The two new components should have a unit test setup for them with the same directory structure. (e.g. a test at tests/unit/test_t5_block.py). Then in that unit test we need to test the individual groups of logic in isolation. This means that it might be a good idea to separate something like the different operation groups found in your forward function into smaller functions, so that we can focus in on those operations to ensure that they are being applied correctly. If you have any questions on what I mean with this, let me know. I am happy to help you with improving the test coverage. It's pretty late here, so I don't have time to do any specific examples of what I mean, but I can do that either tomorrow or Monday if you want to wait for me to demonstrate exactly what I mean.

somvy commented 4 months ago

Hi! Thank you for the feedback! Made a quick fix to ensure asserts are replaced with ifs and errors

understand the issue with coverage, i will try ti cut some time to fix it on this week

Could you be more specific in

separate something like the different operation groups found in your forward function into smaller functions

you mean in HookedEncoderDecoder.forward() or in T5Block?

bryce13950 commented 3 months ago

Honestly, after really taking a full look at what you did, I think we are probably fine. What I was referring to when I made the comment about separating functionality in your forward function, I was referring to your forward function on the T5Block, but after really having time to dig into your code today, I think what you are doing is fine.

We are looking to find more ways to test blocks of functionality in isolation for the purpose of creating better unit tests. When it comes to doing that task on some of the classes you added, we may break it down a little bit more, so we can run tests to specific parts of your forward without touching the rest of it, but for the purpose of getting this into a release, I don't think there will be anything else requested of you.

I will mess with your demo to verify a few last minute things tomorrow, and hopefully I will get this closed out at the same time. Thank you very much for your contribution here. I will let you know if anything else comes up, but I wouldn't expect anything.