TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.57k stars 304 forks source link

How to get the Activation cache while the LLM is generating new tokens? #697

Open Meehaohao opened 3 months ago

Meehaohao commented 3 months ago

Question

My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM. I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?

On one hand, the codelogits, cache = model.run_with_cache(prompt, return_cache_object=True) only cache the hidden states of the prompt, because it dosen't run the generate function.

On the other hand, the code output = model.generate(prompt, do_sample = False, max_new_tokens = 20) only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.

So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?

bryce13950 commented 3 months ago

Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option, but it would unfortunately be a pretty low priority given some other projects that are currently being worked on unless someone volunteers to do it.

Meehaohao commented 3 months ago

Got it, thank you.