Closed GameOverFlowChart closed 3 weeks ago
Oh wait I'm still on 0.8.0a I didn't realize that there is a new version, 0.8.1 seems to have changes to kv? Maybe this is fixed already?
This is actually something which I have not really looked into.
RWKV is not a transformer model - as such, it cannot regenerate replies or rollback its state easily. RWKV I believe is a more traditional RNN which has a fixed state size (20.88 MB for 3.1B based on the cached state file) which is entirely replaced every single generation. Because of this, you can't just trim the KV like a transformer since the entire state has changed.
That now said, ChatterUI really isn't prepared for such a model architecture, as llama.rn expects Transformers only. I'm not sure how best to tackle this. There are possibly hacky solutions by saving cache per message and only allowing regenerates, or perhaps there are some lower level apis in llama.cpp that could fix this.
For now, I think I'll just not officially support RWKV as its a very experimental and niche architecture. So this issue is closed for now. If the use of RWKV picks up or if there are some llama.cpp features I'm missing for this I will reconsider.
I could finally test rwkv with chatterui. This bug seems to be rwkv (6) specific probably because of it's architecture difference, pressing the regenerate button seems to continue generation. Even deleting and starting a new conversation seems to keep old info in it's context. (I'm not using the feature that saves kv).