Open Meehaohao opened 3 months ago
Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option, but it would unfortunately be a pretty low priority given some other projects that are currently being worked on unless someone volunteers to do it.
Got it, thank you.
Question
My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM. I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?
On one hand, the code
logits, cache = model.run_with_cache(prompt, return_cache_object=True)
only cache the hidden states of the prompt, because it dosen't run the generate function.On the other hand, the code
output = model.generate(prompt, do_sample = False, max_new_tokens = 20)
only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?