Closed wwwqqyy closed 2 months ago
Thanks!
I have updated the mamba2.py
code, it now implements a proper caching mecanism. I have put an explanation of how it works at the top of the file, hope it helps you. If you still have question, don't hesitate to ask!
For how to use it, you can take a look at the lm.py
file which encapsulates a Mamba(2) object into a language model.
Precisely, you can take a look at the generate
function (which decodes from the model but uses no caching) and the generate4
function which decodes from the model with prompt prefill + step by step decoding (so with caching).
Thank you very much, I have seen your recent update!
Thank you for providing such excellent code, I have a question for you, how is the cache of step function used in the following code block? What does it do? Can you give an example of how it works? Thank you.
class Mamba2(nn.Module):