Vali-98 / ChatterUI

Simple frontend for LLMs built in react-native.
GNU Affero General Public License v3.0
591 stars 31 forks source link

Rwkv regenerate bug #108

Closed GameOverFlowChart closed 3 weeks ago

GameOverFlowChart commented 3 weeks ago

I could finally test rwkv with chatterui. This bug seems to be rwkv (6) specific probably because of it's architecture difference, pressing the regenerate button seems to continue generation. Even deleting and starting a new conversation seems to keep old info in it's context. (I'm not using the feature that saves kv).

GameOverFlowChart commented 3 weeks ago

Oh wait I'm still on 0.8.0a I didn't realize that there is a new version, 0.8.1 seems to have changes to kv? Maybe this is fixed already?

Vali-98 commented 3 weeks ago

This is actually something which I have not really looked into.

RWKV is not a transformer model - as such, it cannot regenerate replies or rollback its state easily. RWKV I believe is a more traditional RNN which has a fixed state size (20.88 MB for 3.1B based on the cached state file) which is entirely replaced every single generation. Because of this, you can't just trim the KV like a transformer since the entire state has changed.

That now said, ChatterUI really isn't prepared for such a model architecture, as llama.rn expects Transformers only. I'm not sure how best to tackle this. There are possibly hacky solutions by saving cache per message and only allowing regenerates, or perhaps there are some lower level apis in llama.cpp that could fix this.

For now, I think I'll just not officially support RWKV as its a very experimental and niche architecture. So this issue is closed for now. If the use of RWKV picks up or if there are some llama.cpp features I'm missing for this I will reconsider.